Wikipedia:DOIUpload/Evaluating Support for the Current Classification of Eukaryotic Diversity
Evaluating Support for the Current Classification of Eukaryotic Diversity
, 2006.
PLoS Genetics , vol. 2, iss. p.
Abstract
editPerspectives on the classification of eukaryotic diversity have changed rapidly in recent years, as the four eukaryotic groups within the five-kingdom classification—plants, animals, fungi, and protists—have been transformed through numerous permutations into the current system of six “supergroups.” The intent of the supergroup classification system is to unite microbial and macroscopic eukaryotes based on phylogenetic inference. This supergroup approach is increasing in popularity in the literature and is appearing in introductory biology textbooks. We evaluate the stability and support for the current six-supergroup classification of eukaryotes based on molecular genealogies. We assess three aspects of each supergroup: (1) the stability of its taxonomy, (2) the support for monophyly (single evolutionary origin) in molecular analyses targeting a supergroup, and (3) the support for monophyly when a supergroup is included as an out-group in phylogenetic studies targeting other taxa. Our analysis demonstrates that supergroup taxonomies are unstable and that support for groups varies tremendously, indicating that the current classification scheme of eukaryotes is likely premature. We highlight several trends contributing to the instability and discuss the requirements for establishing robust clades within the eukaryotic tree of life.==Abstract== SynopsisEvolutionary perspectives, including the classification of living organisms, provide the unifying scaffold on which biological knowledge is assembled. Researchers in many areas of biology use evolutionary classifications (taxonomy) in many ways, including as a means for interpreting the origin of evolutionary innovations, as a framework for comparative genetics/genomics, and as the basis for drawing broad conclusions about the diversity of living organisms. Thus, it is essential that taxonomy be robust. Here the authors evaluate the stability of and support for the current classification system of eukaryotic cells (cells with nuclei) in which eukaryotes are divided into six kingdom level categories, or supergroups. These six supergroups unite diverse microbial and macrobial eukaryotic lineages, including the well-known groups of plants, animals, and fungi. The authors assess the stability of supergroup classifications through time and reveal a rapidly changing taxonomic landscape that is difficult to navigate for the specialist and generalist alike. Additionally, the authors find variable support for each of the supergroups in published analyses based on DNA sequence variation. The support for supergroups differs according to the taxonomic area under study and the origin of the genes (e.g., nuclear, plastid) used in the analysis. Encouragingly, combining a conservative approach to taxonomy with increased sampling of microbial eukaryotes and the use of multiple types of data is likely to produce a robust scaffold for the eukaryotic tree of life.
Introduction
editBiological research is based on the shared history of living things. Taxonomy—the science of classifying organismal diversity—is the scaffold on which biological knowledge is assembled and integrated into a cohesive structure. A comprehensive eukaryotic taxonomy is a powerful research tool in evolutionary genetics, medicine, and many other fields. As the foundation of much subsequent research, the framework must, however, be robust. Here we test the existing framework by evaluating the support for and stability of the classification of eukaryotic diversity into six supergroups.
Eukaryotes (organisms containing nuclei) encompass incredible morphological diversity from picoplankton of only two microns in size to the blue whale and giant sequoia that are eight orders of magnitude larger. Many evolutionary innovations are found only in eukaryotes, some of which are present in all lineages (e.g., the cytoskeleton, nucleus) and others that are restricted to a few lineages (e.g., multicellularity, photosynthetic organelles [plastids]). These and other eukaryotic features evolved within microbial eukaryotes (protists) that thrived for hundreds of millions of years before they gave rise independently to multicellular eukaryotes, the familiar plants, animals, and fungi [1]. Thus, elucidating the origins of novel eukaryotic traits requires a comprehensive phylogeny—an inference of organismal relationships—that includes the diverse microbial lineages.
Higher-level classifications have historically emphasized the visible diversity of large eukaryotes, as reflected by the establishment of the plant, animal, and fungal kingdoms. In these schemes the diverse microbial eukaryotes have generally been placed in one (Protista [2]–[3] or Protoctista [[4]) or two (Protozoa and Chromista [[5]) groups (Figure 1; but see also [6],[7]). However, this historic distinction between macroscopic and microscopic eukaryotes does not adequately capture their complex evolutionary relationships or the vast diversity within the microbial world.
In the past decade, the emphasis in high-level taxonomy has shifted away from the historic kingdoms and toward a new system of six supergroups that aims to portray evolutionary relationships between microbial and macrobial lineages. The supergroup concept is gaining popularity as evidenced by several reviews [8],[9] and inclusion in forthcoming editions of introductory biology textbooks. In addition, the International Society of Protozoologists recently proposed a formal reclassification of eukaryotes into six supergroups, though acknowledging uncertainty in some groups [[6].
The Supergroups
editBelow we introduce the six supergroups in alphabetical order (Figure 2). The supergroup “Amoebozoa” was proposed in 1996 [10]. Original evidence for the group was drawn from molecular genealogies and morphological characters such as eruptive pseudopodia and branched tubular mitochondrial cristae. However, no clear synapomorphy—shared derived character—exists for “Amoebozoa.” In fact, amoeboid organisms are not restricted to the “Amoebozoa,” but are found in at least four of the six supergroups.
The “Amoebozoa” include a diversity of predominantly amoeboid members such as Dictyostelium discoideum (cellular slime mold), which is a model for understanding multicellularity [11]. Another member, Entamoeba histolytica, is an amitochondriate amoeba (Pelobiont) and is the cause of amoebic dysentery, an intestinal infection with global health consequences [12].
“Chromalveolata” was introduced as a parsimonious, albeit controversial, explanation for the presence of plastids of red algal origin in photosynthetic members of the “Alveolata” and “Chromista” [13]. Under this hypothesis, the last common ancestor of the chromalveolates was a heterotroph that acquired photosynthesis by engulfing a red alga and retaining it as a plastid [[14],[15]. The “Alveolata” include ciliates, dinoflagellates, and apicomplexa, and its monophyly is well supported by morphology and molecules. “Chromista” was created as a kingdom to unite diverse microbial lineages with red algal plastids (and their nonphotosynthetic descendants) [[5],[16], but no clear synapomorphy unites this clade.
The supergroup “Chromalveolata” includes microbes with critical roles in the environment and in human health. Numerous key discoveries emerged from studies of the model organism Tetrahymena (ciliate: “Alveolata”), including self-splicing RNAs and the presence of telomeres [17]. Phytophthora (stramenopile: “Chromista”), a soil-dwelling organism, is the causative agent of the Irish Potato Famine [18], whereas Plasmodium (Apicomplexa: “Alveolata”) is the causative agent of malaria [19].
“Excavata” is a supergroup composed predominately of heterotrophic flagellates whose ancestor is postulated to have had a synapomorphy of a conserved ventral feeding groove [20]. Most members of “Excavata” are free-living heterotrophs, but there are notable exceptions that are pathogens. For example, Giardia (Diplomonada) causes the intestinal infection giardiasis, and Trichomonas vaginalis (Parabasalia) is the causative agent of a sexually transmitted disease [21]. Kinetoplastids, such as Trypanosoma (Euglenozoa), have unique molecular features such as extensive RNA editing of mitochondrial genes that is templated by minicircle DNA [22].
“Opisthokonta” includes animals, fungi, and their microbial relatives. This supergroup emerged from molecular gene trees [23] and is united by the presence of a single posterior flagellum in many constituent lineages [[24]. Molecular studies have expanded microbial membership of the group and revealed a potential molecular synapomorphy, an insertion in the Elongation Factor 1α gene in lineages containing this ortholog [[25],[26].
“Opisthokonts” include many biological model organisms (Drosophila, Saccharomyces). Vast amounts of research have been conducted on members of this supergroup and much textbook science is based on inferences from these lineages. Other notable opisthokonts include Encephalitozoon (Microsporidia: Fungi), a causative agent of diarrhea, which has one of the smallest known nuclear genomes at 2.9 MB [27]. Also included within the “Opisthokonta” are the choanoflagellates (e.g., Monosiga), which are the sister to animals [28].
The supergroup “Plantae” was erected as a kingdom in 1981 [29] to unite the three lineages with primary plastids: green algae (including land plants), rhodophytes, and glaucophytes. Under this hypothesis a single ancestral primary endosymbiosis of a cyanobacterium gave rise to the plastid in this supergroup [[30]. The term “Plantae” has been used to describe numerous subsets of photosynthetic organisms, but in this manuscript will only be used in reference to the supergroup.
Well-known “Plantae” genera include Arabidopsis, a model angiosperm, and Porphyra (red alga), the edible seaweed nori. Within the “Plantae” there have been numerous independent origins of multicellularity including: Volvox (Chlorophyta) [31], the land plants, and red algae.
“Rhizaria” emerged from molecular data in 2002 to unite a heterogeneous group of flagellates and amoebae including: cercomonads, foraminifera, diverse testate amoebae, and former members of the polyphyletic radiolaria [32]. “Rhizaria” is an expansion of the “Cercozoa” [[5] that was also recognized from molecular data [[33],[34]. “Cercozoa” and foraminifera appear to share a unique insertion in ubiquitin [[35], but there is a paucity of non-molecular characters uniting members of “Rhizaria.”
“Rhizaria” encompasses a diversity of forms, including a heterotrophic flagellate Cercomonas (Cercomonada: “Cercozoa”) and a photosynthetic amoeba Paulinella chromatophora, (Silicofilosea: “Cercozoa”). The latter likely represents a recent endosymbiosis of a cyanobacterium [36],[37]. Some members of the “Rhizaria,” notably the shelled foraminifera, also have a substantial fossil record that can be used to determine the age of sediments [[38].
Our Approach
editTo assess the robustness of the six proposed supergroups, we compare formal taxonomies and track group composition and nomenclature across time (Figures 1 and 3). We also evaluate support for the six supergroups by analyzing published molecular genealogies that either target a specific supergroup or aim to survey all supergroups. Our focus on molecular genealogies is limited. We recognize that supergroups have, in many cases, been defined by suites of characters such as flagellar apparatus in “Excavata” [32],[39] and “Opisthokonta”[[24], and that groups are more robust when supported by multiple data types (see Discussion). Use of genealogies is further complicated because a genealogy is the reconstruction of the history of a gene, and may or may not be congruent with phylogenies, which depict the history of organisms [40],[41]. Despite these factors, our treatment of molecular genealogies is warranted given the prevalence of molecular analyses in the literature that seeks support for supergroups and the reliance on these gene trees in establishing taxonomy.
For each genealogy we evaluate the taxon sampling for the targeted supergroup (Membership; Figures 4–9) and the monophyly of all supergroups with at least two member taxa (Supergroup monophyly; Figures 4–9). Monophyletic clades, those that include an ancestor and all of its descendants [45], are scored (+; Figures 4–9). We assess support for supergroups when they are targeted by specific studies and when they are included as out-groups in studies targeting other supergroups. A conservative measure of out-group monophyly was used because we required only two member lineages be present. In contrast, focal supergroups had broader taxonomic sampling.
Results
editTaxonomic Instability
editThere is considerable instability in taxonomies of the six putative supergroups (Figure 3). Causes of the rapid revisions in eukaryotic taxonomy over short time periods include: (1) nomenclatural ambiguity, (2) ephemeral and poorly supported higher-level taxa, and (3) classification schemes erected under differing taxonomic philosophies. For example, taxonomy of the “Amoebozoa,” a term originally introduced by Lühe in 1913 [52] to encompass a very different assemblage of organisms, has changed considerably in ten years (Figure 3A). “Variosea” was created as a subclade within the “Amoebozoa” in 2004 to group taxonomically unplaced genera of amoebae with “exceptionally varied phenotype” [42]. Rarely supported by morphology or molecular evidence [[53]–[46], this taxon was excluded from subsequent classifications [[6],[43] but is still discussed in the literature [[53]. Similarly, the excavate taxon “Loukozoa” [[5] has been continually redefined to include a variety of taxa bearing a ventral groove (Figure 3B) and finally abandoned [39]. The taxonomy of “Rhizaria” has emerged largely from molecular genealogies and has varied partly in response to shifting topology of gene trees that change with taxon sampling and the method of tree construction [[5],[32],[54],[55] (Figure 3D).
The taxonomy of “Plantae” is destabilized by the complex history of the term. Used since Haeckel's time [56], “Plantae” has been redefined numerous times to describe various collections of photosynthetic organisms, leading to major discrepancies between taxonomic schemes (Figure 3C; e.g., [2],[4]). The term “Archaeplastida” was recently introduced to alleviate confusion over “Plantae,” but this synonym is not widely used.
The stability of two supergroups, “Chromalveolata” and “Opisthokonta,” cannot be assessed at this time because only a single formal taxonomy exists [6]. Other classification schemes of eukaryotes segregate animals and fungi as separate kingdoms and place microbial opisthokonts in the kingdom Protozoa (Figure 1) [5],[32]. Similarly, chromalveolate members are often divided between the polyphyletic kingdoms “Chromista” and “Protozoa” (Figure 1) [32],[46].
Varying Support for Membership within and Monophyly of Targeted Supergroups
editSeveral supergroups are generally well supported when targeted in molecular systematic studies. Strikingly, the monophyly of both the original and expanded “Opisthokonta” members is strongly supported in all investigations targeting the group (ten of ten, Figure 7). Two other supergroups are also well supported: “Rhizaria” monophyly is recovered in 11 of 14 studies focusing on this supergroup (Figure 9) and “Amoebozoa” retained in five of seven topologies (Figure 4). However, support for these groups is expected, given that they were recognized from molecular gene trees [10],[32].
“Excavata” rarely form a monophyletic group in molecular systematic studies targeting this supergroup (two of nine; Figure 6). Moreover, the position of putative members, jakobids, Malawimonas, parabasalids, and Diphylleia vary by analysis (Figure 6). Three distinct subclades, all of which are supported by ultrastructural characters [39], are generally recovered (Fornicata [six of six], Preaxostyla [six of six], and Discicristata [five of eight]; Figure 6).
Support for two supergroups varies depending on the type of character used: plastid or nuclear. The monophyly of “Plantae” and “Chromalveolata” are well supported by plastid characters: four of four plastid analyses (Figure 8) and six of nine (Figure 5), respectively. The “Plantae” clade is monophyletic in only three of six analyses using nuclear genes, including Elongation Factor 2 [64] and a 100+ gene analysis that included very limited taxon sample [[65]. Nuclear loci never support “Chromalveolata” (zero of six; Figure 5), though alveolates and stramenopiles often form a clade to the exclusion of haptophytes and cryptophytes (e.g., [23],[48]; Figures 4 and 7).
Decreased Support for Monophyly of Supergroups as Out-Groups in Other Studies
editFor each genealogy we also assessed the monophyly of the supergroups when included as out-groups. Overall, we find that support for the monophyly of a given supergroup is stronger when targeted and support decreases when the same supergroup is included as an out-group in other studies.
This trend is particularly unexpected given our less stringent requirements for monophyly of out-groups: a minimum of only two members need be included, while targeted groups had broader taxon sampling (see Methods). A priori, it would seem that the lower stringency could allow a limited sample of supergroup members to substitute for overall supergroup monophyly, thereby increasing the occurrence of supergroup monophyly for out-group taxa. However, this scenario is realized only in the groups that receive poor support, “Excavata” and “Chromalveolata,” assessed by nuclear genes. “Excavata” is monophyletic more frequently when members are included as out-groups (seven of 30, Figures 4, 5, and 7–9, versus two of nine, Figure 8). Taxonomic sampling of these lineages is often considerably lower in non-targeted analysis, and monophyly reflects that of the subclades “Discicristata” or “Fornicata” (such as in [49],[76],[71], but see [[44],[63] for two exceptions, Figures 4, 6, and 9). “Chromalveolata” is monophyletic in ten of 45 nuclear gene trees targeting other taxonomic areas (Figures 4 and 6–9). Intriguingly, in all ten of the cases where nuclear genes support monophyletic “Chromalveolata,” only alveolates and stramenopiles are included (Figures 4–9).
In contrast, the remaining supergroups are monophyletic less often when included as out-groups. For example, “Opisthokonta” was recovered in all studies targeting this supergroup, but in only 33 of 41 studies that target other groups (Figures 5–9). Similarly, both the “Amoebozoa” and “Rhizaria” are monophyletic less often when their members are included as out-groups in studies targeting the remaining five supergroups (15 of 35 and eight of 15, respectively: Figures 5–9 and 4–8). When included as an out-group, “Plantae” plastids usually form a monophyletic clade (eight of nine analyses, Figure 5) but support is much lower in nuclear gene trees (11 of 42, Figures 4–7 and 9).
Discussion
editOur analysis reveals varying levels of stability and support for the six supergroups (Figure 2). Below, we assess the status of each supergroup, describe factors that contribute to the instability, and propose measures to improve reconstruction of an accurate eukaryotic phylogeny.
Supergroup Robustness
editRobust taxa—those consistently supported by multiple datasets—are emerging and include the supergroup “Opisthokonta.” This group of animals, fungi, and their microbial relatives receives consistent support in molecular genealogies. This supergroup was monophyletic in 43 of 51 trees we examined (Figures 4–9). “Opisthokonta” is also united by additional types of data: most members share a single posterior flagellum, contain plate-like cristae in mitochondria, and have an insertion within the Elongation Factor 1α gene [7],[24]–[26].
The remaining five supergroups receive varying degrees of support from molecular genealogies. “Amoebozoa” and “Rhizaria” received high support in analyses that targeted them (Figures 4 and 9, respectively) but formed monophyletic clades less often when included as out-groups. The two photosynthetic clades “Chromalveolata” and “Plantae” receive differential support depending on the origin of the gene: high support in plastid genealogies but low in nuclear gene trees (Figures 5 and 8, see Results). Molecular support for the “Excavata” as a whole is lacking from well-sampled gene trees (Figure 6).
Although the six supergroups are not consistently supported by molecular genealogies, some nested clades are emerging as robust groups. For example, a sister relationship between Alveolata and Stramenopila is often recovered. It is this relationship that makes “Chromalveolata” appear monophyletic in nuclear genealogies when only these clades are included as outgroups (e.g., [23],[48], and Figures 4 and 7). There is also growing support for several subgroups within the poorly supported “Excavata” (i.e., “Fornicata” and “Preaxostyla”; Figure 6).
Alternative Hypotheses
editAlthough it is clear from our analysis that eukaryotic supergroups are not well supported, no alternative high-level groupings emerge from molecular genealogies. Rather, there is support for lower-level groups, such as the “Excavata” subgroups discussed above and perhaps also alveolates plus stramenopiles. This suggests that either there are no higher-level groupings to be found, or there is as yet inadequate data to resolve these clades. We believe that lack of taxon sampling is the key to resolution.
Further evidence against the six-supergroup view of eukaryotic diversity is the existence of “nomadic” taxa—lineages that do not have a consistent sister group, but instead wander between various weakly supported positions. Some nomadic taxa are acknowledged incertae sedis (of unknown taxonomic position) such as Ancyromonas, Breviata, and Apusomonadidae [6],[7]. Other taxa that have been assigned to supergroups also appear to be nomadic, including Haptophyta (putative member of “Chromalveolata”) and Malawimonas (putative member of “Excavata”). For example, the haptophytes variously branch with Centrohelida and red algae [42], sister to a clade of “Rhizaria” and Heterolobosea [[49], sister to cryptophytes [[58], and in a basal polytomy [[72]. These nomadic taxa may either represent independent, early diverging lineages or their phylogenetic position cannot yet be resolved with the data available. Again, we feel that taxon sampling is the key in order to distinguish between these possibilities.
Why Is Eukaryotic Taxonomy So Difficult?
editThe variable support for relationships is in part attributable to the inherent difficulty of deep phylogeny, the chimeric nature of eukaryotes, misidentified organisms, and conflicting approaches to taxonomy. Here we elaborate on these destabilizing trends and provide illustrative examples.
Challenges of deep phylogeny.
editReconstructing the history of eukaryotic lineages requires extraction of phylogenetic signal from the noise that has accumulated over many hundreds of millions of years of divergent evolutionary histories. There is doubt whether resolution of divergences this deep can be resolved with molecular data [77]. Additionally, the nature of the relationships may also pose a significant challenge. For example, a rapid radiation of major eukaryotic lineages has been proposed [[78] and is the most difficult scenario to resolve because of the lack of time to accumulate synapomorphies at deep nodes.
Further, phylogenetic relationships can be obscured by heterogeneous rates of evolution and divergent selection pressures. For example, genes in many parasitic lineages of eukaryotes experience elevated rates of evolution. If not properly accounted for, these fast lineages will group together due to long-branch attraction [79],[80]. This was the case for Microsporidia, intracellular parasites of animals; early small subunit rDNA (SSU) genealogies placed the Microsporidia at the base of the tree with other amitochondriate taxa, including Giardia and Entamoeba [81]. These parasites were united under the “Archezoa” hypothesis [[82]. More recent analyses with appropriate models of evolution [[83] and those using protein-coding genes [[84] place the Microsporidia within fungi and falsify “Archezoa.” This example demonstrates the importance of phylogenetic methods in the interpretation of eukaryotic diversity. In our analysis we find no clear correlation between method of tree building and group stability. Arguments about phylogenetic inference have been discussed extensively [[77],[85]–[86], and increasingly sophisticated algorithms are being developed to compensate for the difficulties [[87]–[88].
The chimeric nature of eukaryotes.
editReconstructing the history of eukaryotic lineages is complicated by the horizontal transfer of genes and organelles [89],[90]–[91]. For example, “Chromalveolata” plastid genes tell one story, consistent with a single transfer from red algae, which is not currently supported by available nuclear genes (Figure 5). There is also a growing body of evidence for aberrant lateral gene transfers in eukaryotes (reviewed in [90],[92]).
Instability due to misidentification.
editMisidentification destabilizes taxonomy because all efforts to classify a misidentified organism reach erroneous conclusions. Cases of misidentification lead to inaccurate conclusions and require considerable effort to remedy. There is a rigorous standard for identifying microbial eukaryotes, but this standard is not always upheld. For example, the putative “Amoebozoa” species “Mastigamoeba invertens” that always branched outside the “Amoebozoa” clade [42],[46],[50] was misidentified [[93]; it has now been properly described as Breviata anathema and is not yet placed within any of the supergroups [93].
Inaccurate conclusions about organismal relationships can also result from contamination (e.g., from symbionts and parasites). The results of subsequent molecular genealogies are therefore wrong and misleading. For example, opalinids, multinucleated flagellates that inhabit the lower digestive track of Anurans, were placed in the stramenopiles (Slopalinida: “Chromalevolata”) based on ultrastructural data [94]. However, the first molecular sequences for this group placed them within fungi (Opalina ranarum and Cepedea virguloidea [95],[96]). These sequences were later shown to belong to zygomycete fungal contaminants, not to the opalinids. Subsequent isolates (Protoopalina intestinalis) yielded genealogies congruent with the ultrastructural data, placing P. intestinalis within the stramenopiles [97]. To avoid setbacks and confusion due to misidentification, we propose that all analyses of eukaryotic diversity include a vouchering system for strains, images, and DNAs.
Conflicting approaches to taxonomy.
editOur evaluation of the stability of taxonomy for supergroups reveals a rapidly changing landscape (Figures 1 and 3). The instability in higher-level classifications of eukaryotes reflects the diversity of philosophical approaches, the exploratory state of eukaryotic taxonomy, and premature taxon naming. Many researchers seek schemes based on monophyletic groupings so that their taxonomies reflect evolutionary relationships [6],[7],[98],[99]. In contrast, others employ a taxonomic philosophy in which evolutionary relatedness and monophyly are just one criterion from a set of group characteristics [[32]. Paraphyly—a taxon defined without all descendants—is tolerated in these systems, and paraphyletic taxa are designated as such (see [[5] p. 210–215 for explanation of such a philosophy).
In many cases, classification schemes that are separated by two years or less vary substantially from one another (e.g., Figure 3A and 3B). New groups and fluctuating group composition result in numerous cases of homonymy (two concepts linked to one name), synonymy (one concept linked to two names), and redefinition of existing terms. For example, at the highest level the terms “Amoebozoa,” “Opisthokonta,” and “Plantae” were all introduced under different definitions [3],[52],[56] before being applied to supergroups. The term “Plantae” is an extreme case of homonymy having referred to numerous groups of photosynthetic organisms over the past century and a half (Figure 3C). The rapidly changing taxonomic landscape makes it difficult for non-specialists as well specialists to follow the current debate over supergroups.
Toward a Robust Scaffold to the Eukaryotic Tree of Life
editTaxonomic sampling.
editPerhaps the most critical aspect of the current state of eukaryotic systematics is the very limited taxonomic sampling to date. This is particularly problematic as the supergroup literature is often derived from a resampled pool of genes and taxa. More than 60 lineages of microbial eukaryotes have been identified by ultrastructure [7], yet only about one-half of these have been included in molecular analyses. Furthermore, even when these lineages are included, they are generally represented by a single species. Such sparse sampling increases the risk of long-branch attraction as discussed above, such as occurred for Giardia, and may cause artifactual relationships [100]. Further, analyses of sequences from newly sampled lineages have altered or expanded supergroup definitions (e.g., nucleariids in “Opisthokonta” [[101] and Phaeodarea in “Rhizaria” [[102]). Thus, statements of monophyly may be premature when taxonomic sampling is low.
There is tension between increasing the number of taxa versus the numbers of genes. Several theoretical works have demonstrated the diminishing returns of increased number of genes relative to increased taxon sampling [103]–[48], but see [[104]. In addition, increasing taxon sampling can lead to shifts in molecular tree topology [[105]–[106]. These results provide incentive to concentrate sequencing efforts on obtaining more taxa and a moderate number of genes. We recommend increasing the lineages sampled and the number of diverse taxa within lineages. We are optimistic that as data become available from a greater diversity of taxa, eukaryotic phylogeny will become increasingly more resolved.
Multiple character sets.
editWe further anticipate that support for clades will increase as additional character sets are incorporated. Phylogenies based on single characters, whether genes, morphology, or ultrastructure, are subject to biases in the data and are not reliable by themselves. Hence, multiple character sets should be used to corroborate results. Ultrastructural apomorphies combined with molecular genealogies have proven to be good indicators of phylogeny at the level below supergroups [39],[107]. This approach has bolstered support for “Fornicata” and “Preaxostyla,” which are consistently recovered in molecular genealogies and have defining ultrastructural characters. As we move forward with multiple character sets, we must shift from searching for characters to support hypotheses to evaluating hypotheses in light of all available data.
Well-sampled multigene and genome scale molecular systematics provide another powerful tool for resolving ancient splits in the tree of life. The National Science Foundation initiative “Assembling the Tree of Life” provides evidence of this shift in systematics research, whereby all proposals involve multigene or genome (organellar) sequencing to establish robust phylogenetic hypotheses (see http://www.nsf.gov/pubs/2005/nsf05523/nsf05523.htm; [65],[48]). The EuTree consortium (http://www.eutree.org) aims to increase substantially the sampled diversity of eukaryotes by focusing on understudied lineages in our multigene project to assemble the tree of life.
An example of multigene study is analysis of genes involved in clade-specific functions. This approach has been employed in testing “Plantae” and “Chromalveolata” (e.g., [108]). A single endosymbiosis (of a cyanobacterium in “Plantae” and red alga in “Chromalveolata”) predicts that the systems that facilitate controlled exchange of metabolic intermediates between the symbiotic partners be shared by putative members of these two supergroups [[109]. This prediction has been supported by analyses of the plastid import machinery [[110] and antiporters that transport fixed carbons across the plastid membranes [[111]. However, taxon sampling has been limited in these studies. Currently, increased sampling of genomes from diverse photosynthetic eukaryotes is yielding additional genes for clade-specific predictions [[59],[60].
A conservative approach to taxonomy.
editBecause taxonomy is the foundation for much of the dialog and research in evolutionary biology, there must be an unambiguous taxonomic system in which one term is linked to one concept. In contrast to this ideal, homonymy and redefinition are prevalent in the taxonomy of eukaryotes, often as the result of premature introduction or redefinition of taxa (see above; Figure 3). Emerging hypotheses benefit the community by sparking new research to test the hypothesis, but they also introduce ambiguity. To alleviate the confusion, we suggest introducing hypotheses as informal groups and using inverted commas to indicate the existence of a caveat, as done with the uncertain groups in this manuscript. These steps will inform the community that group composition is likely to change, alleviate quick taxon turnover, and promote stable taxa that are more resistant to compositional change.
As increasing amounts of data become available, well-supported nodes emerge and classifications tend to stabilize, such as is occurring for the ordinal framework for angiosperms [112],[113]. Similarly, we expect that this conservative approach, combined with increased sampling of taxa and genes, will promote the future stabilization of eukaryotic classification.
Conclusion
editAlthough the level of support varies among groups, the current classification of eukaryotes into six supergroups is being adopted broadly by the biological community (i.e., evidenced by its appearance in biology textbooks). The supergroup “Opisthokonta” and a number of nested clades within supergroups are supported by most studies. However, support for “Amoebozoa,” “Chromalveolata,” “Excavata,” “Plantae,” and “Rhizaria” is less consistent. The supergroups, and eukaryotic taxonomy in general, are further destabilized by considerable fluidity of taxa, taxon membership, and ambiguous nomenclature as revealed by comparison of classification schemes.
The accurate reconstruction of the eukaryotic tree of life requires: (1) a more inclusive sample of microbial eukaryotes; (2) distinguishing emerging hypotheses from taxa corroborated by multiple datasets; and (3) a conservative, mutually agreed upon approach to establishing taxonomies. Analyses of these types of data from a broad, inclusive sampling of eukaryotes are likely to lead to a robust scaffold for the eukaryotic tree of life.
Methods
editStability of taxonomy.
editTo assess the stability of supergroup taxonomies over time, we selected three classification schemes for each supergroup and tracked both the stability of taxa membership (solid and dashed lines; Figures 1 and 3) and the fate of newly created taxon names (asterisk; Figure 3). In sampling representative taxonomies, we aimed to capture a diversity of authors and opinions. In the case of “Opisthokonta” and “Chromalveolata” we are aware of only one formal, peer-reviewed classification scheme [6]. Given the lack of equivalency in ranks between taxonomies, we have chosen to display three levels with the intention of listing equivalent levels clearly.
Membership support.
editWithin each supergroup, we assess the support for each member taxon by documenting its inclusion in molecular genealogies (Figures 4–9). Member taxa were chosen because they are historically a well-supported group, usually with an ultrastructural identity. The haptophytes are such a group, and share a haptonema [7]. We included members that represent a broad interpretation of the supergroup. For example, “Rhizaria” member taxa include groups (e.g., apusomonads) originally placed in “Rhizaria” but later removed. We considered a taxon to be a supported member of its supergroup (filled circles; Figures 4–9) when it falls within a monophyletic clade containing a majority of the supergroup members. A taxon that falls outside of its supergroup clade, or on the occasion that a majority of members do not form a monophyletic clade, is considered unsupported in that genealogy (open circles; Figures 4–9).
The inclusion of a genealogy requires that it be found in a paper that specifically addresses one of the supergroups or analyzes broad eukaryotic diversity. The genealogies must also include adequate sampling—two-member taxa per supergroup—from at least two of the six supergroups to allow for the comparison of supergroup monophyly. In cases where multiple gene trees are presented we display the authors' findings as multiple entries when the trees are not congruent or as a single entry when the trees are concordant. Due to the lack of monophyly in virtually all analyses, we have evaluated the support for several hypothesized subgroups within the “Excavata” (geometric shapes; Figure 6).
Supergroup monophyly.
editTo assess monophyly of supergroups, we used the set of genealogies described above to evaluate the molecular support for the supergroups as interpreted by Adl et al. 2005 ([6]; Figures 4–9). We analyzed the monophyly [45] of each supergroup in trees having at least two member taxa present (+/− Figures 4–9). We do not indicate the method of tree construction. Although the algorithm used is important, we did not find a clear correlation between supported groups and algorithm used. We were also liberal in accepting any level of support (e.g., bootstrap values and posterior probabilities ranged from 4%–100%) when determining monophyly, in part because there is debate over acceptable cutoff values [114]–[115].
Supporting Information
editAccession Numbers
editInformation about commonly used genes for phylogenesis of microbial eukaryotes discussed in this paper can be found in the Homologene database at NCBI (http://www.ncbi.nlm.nih.gov/Genbank): actin (88645), α-tubulin (81745), β-tubulin (69099), Elongation Factor 1α gene (68181), small subunit rDNA (6629), and ubiquitin (39626). Accession numbers for genes from misidentified organisms can be found at NCBI in GenBank (http://www.ncbi.nlm.nih.gov/Genbank). Misidentified opalinids: Opalina ranarum (AF141969) and Cepedea virguloidea (AF141970); correctly identified Protoopalina intestinalis (AY576544–AY576546) and Breviata anathema (AF153206). Sequences for Encephalitozoon cuniculi can be found at NCBI under genome project number 9545.Many thanks to Giselle Walker for discussions about supergroups and for graciously sharing a manuscript of her own. The authors also thank John Logsdon and Toby Kiers for comments and Jan Pawlowski for helpful discussions on nomadic lineages.Competing interests. The authors have declared that no competing interests exist.A previous version of this article appeared as an Early Online Release on November 13, 2006 (doi:[10.1371/journal.pgen.0020220.eor 10.1371/journal.pgen.0020220.eor]).Author contributions. LWP, EB, DJP, and LAK conceived and designed the experiments. LWP, EB, EL, and LAK analyzed the data. LWP, EB, MD, DB, DJP, and LAK wrote the paper.Funding. This work is supported by the National Science Foundation Assembling the Tree of Life grant (043115) to DB, DJP, and LAK.
References
edit- ↑ 1 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b001" defined multiple times with different content - ↑ 2.0 2.1 2.2 2.3 2 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b002" defined multiple times with different content - ↑ 3.0 3.1 3.2 4 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b004" defined multiple times with different content - ↑ 4.0 4.1 4.2 5 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b005" defined multiple times with different content - ↑ 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 6 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b006" defined multiple times with different content - ↑ 6.00 6.01 6.02 6.03 6.04 6.05 6.06 6.07 6.08 6.09 6.10 6.11 6.12 7 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b007" defined multiple times with different content - ↑ 7.0 7.1 7.2 7.3 7.4 7.5 8 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b008" defined multiple times with different content - ↑ 9 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b009" defined multiple times with different content - ↑ 10 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b010" defined multiple times with different content - ↑ 10.0 10.1 11 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b011" defined multiple times with different content - ↑ 12 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b012" defined multiple times with different content - ↑ 13 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b013" defined multiple times with different content - ↑ 14 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b014" defined multiple times with different content - ↑ 15 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b015" defined multiple times with different content - ↑ 16 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b016" defined multiple times with different content - ↑ 17 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b017" defined multiple times with different content - ↑ 18 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b018" defined multiple times with different content - ↑ 19 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b019" defined multiple times with different content - ↑ 20 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b020" defined multiple times with different content - ↑ 21 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b021" defined multiple times with different content - ↑ 22 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b022" defined multiple times with different content - ↑ 23 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b023" defined multiple times with different content - ↑ 23.0 23.1 23.2 23.3 24 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b024" defined multiple times with different content - ↑ 24.0 24.1 24.2 24.3 25 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b025" defined multiple times with different content - ↑ 26 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b026" defined multiple times with different content - ↑ 26.0 26.1 26.2 27 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b027" defined multiple times with different content - ↑ 28 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b028" defined multiple times with different content - ↑ 29 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b029" defined multiple times with different content - ↑ 30 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b030" defined multiple times with different content - ↑ 31 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b031" defined multiple times with different content - ↑ 32 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b032" defined multiple times with different content - ↑ 32.00 32.01 32.02 32.03 32.04 32.05 32.06 32.07 32.08 32.09 32.10 32.11 32.12 32.13 32.14 32.15 33 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b033" defined multiple times with different content - ↑ 34 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b034" defined multiple times with different content - ↑ 35 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b035" defined multiple times with different content - ↑ 36 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b036" defined multiple times with different content - ↑ 37 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b037" defined multiple times with different content - ↑ 38 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b038" defined multiple times with different content - ↑ 39 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b039" defined multiple times with different content - ↑ 39.0 39.1 39.2 39.3 39.4 40 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b040" defined multiple times with different content - ↑ 41 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b041" defined multiple times with different content - ↑ 42 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b042" defined multiple times with different content - ↑ 42.0 42.1 42.2 42.3 42.4 45 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b045" defined multiple times with different content - ↑ 43.0 43.1 47 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b047" defined multiple times with different content - ↑ 44.0 44.1 44.2 60 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b060" defined multiple times with different content - ↑ 45.0 45.1 43 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b043" defined multiple times with different content - ↑ 46.00 46.01 46.02 46.03 46.04 46.05 46.06 46.07 46.08 46.09 49 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b049" defined multiple times with different content - ↑ 47.0 47.1 47.2 47.3 47.4 47.5 115 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b115" defined multiple times with different content - ↑ 48.0 48.1 48.2 48.3 48.4 48.5 97 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b097" defined multiple times with different content - ↑ 49.0 49.1 49.2 48 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b048" defined multiple times with different content - ↑ 50.0 50.1 84 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b084" defined multiple times with different content - ↑ 114 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b114" defined multiple times with different content - ↑ 52.0 52.1 44 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b044" defined multiple times with different content - ↑ 53.0 53.1 46 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b046" defined multiple times with different content - ↑ 50 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b050" defined multiple times with different content - ↑ 51 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b051" defined multiple times with different content - ↑ 56.0 56.1 52 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b052" defined multiple times with different content - ↑ 55 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b055" defined multiple times with different content - ↑ 58.0 58.1 56 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b056" defined multiple times with different content - ↑ 59.0 59.1 107 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b107" defined multiple times with different content - ↑ 60.0 60.1 108 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b108" defined multiple times with different content - ↑ 61.0 61.1 122 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b122" defined multiple times with different content - ↑ 123 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b123" defined multiple times with different content - ↑ 63.0 63.1 128 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b128" defined multiple times with different content - ↑ 64.0 64.1 53 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b053" defined multiple times with different content - ↑ 65.0 65.1 65.2 54 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b054" defined multiple times with different content - ↑ 129 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b129" defined multiple times with different content - ↑ 132 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b132" defined multiple times with different content - ↑ 133 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b133" defined multiple times with different content - ↑ 139 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b139" defined multiple times with different content - ↑ 57 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b057" defined multiple times with different content - ↑ 71.0 71.1 59 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b059" defined multiple times with different content - ↑ 72.0 72.1 61 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b061" defined multiple times with different content - ↑ 131 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b131" defined multiple times with different content - ↑ 140 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b140" defined multiple times with different content - ↑ 145 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b145" defined multiple times with different content - ↑ 58 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b058" defined multiple times with different content - ↑ 77.0 77.1 62 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b062" defined multiple times with different content - ↑ 63 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b063" defined multiple times with different content - ↑ 64 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b064" defined multiple times with different content - ↑ 65 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b065" defined multiple times with different content - ↑ 66 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b066" defined multiple times with different content - ↑ 67 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b067" defined multiple times with different content - ↑ 68 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b068" defined multiple times with different content - ↑ 69 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b069" defined multiple times with different content - ↑ 70 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b070" defined multiple times with different content - ↑ 76 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b076" defined multiple times with different content - ↑ 77 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b077" defined multiple times with different content - ↑ 79 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b079" defined multiple times with different content - ↑ 74 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b074" defined multiple times with different content - ↑ 90.0 90.1 80 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b080" defined multiple times with different content - ↑ 83 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b083" defined multiple times with different content - ↑ 82 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b082" defined multiple times with different content - ↑ 93.0 93.1 85 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b085" defined multiple times with different content - ↑ 86 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b086" defined multiple times with different content - ↑ 87 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b087" defined multiple times with different content - ↑ 88 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b088" defined multiple times with different content - ↑ 89 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b089" defined multiple times with different content - ↑ 90 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b090" defined multiple times with different content - ↑ 91 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b091" defined multiple times with different content - ↑ 92 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b092" defined multiple times with different content - ↑ 93 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b093" defined multiple times with different content - ↑ 94 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b094" defined multiple times with different content - ↑ 95 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b095" defined multiple times with different content - ↑ 98 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b098" defined multiple times with different content - ↑ 99 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b099" defined multiple times with different content - ↑ 101 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b101" defined multiple times with different content - ↑ 102 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b102" defined multiple times with different content - ↑ 103 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b103" defined multiple times with different content - ↑ 104 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b104" defined multiple times with different content - ↑ 105 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b105" defined multiple times with different content - ↑ 106 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b106" defined multiple times with different content - ↑ 109 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b109" defined multiple times with different content - ↑ 110 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b110" defined multiple times with different content - ↑ 111 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b111" defined multiple times with different content - ↑ 113 Cite error: Invalid
<ref>
tag; name "pgen-0020220-b113" defined multiple times with different content