- Open Access
Difficult phylogenetic questions: more data, maybe; better methods, certainly
© Philippe and Roure; licensee BioMed Central Ltd. 2011
- Received: 20 December 2011
- Accepted: 29 December 2011
- Published: 29 December 2011
Contradicting the prejudice that endosymbiosis is a rare phenomenon, Husník and co-workers show in BMC Biology that bacterial endosymbiosis has occured several times independently during insect evolution. Rigorous phylogenetic analyses, in particular using complex models of sequence evolution and an original site removal procedure, allow this conclusion to be established after eschewing inference artefacts that usually plague the positioning of highly divergent endosymbiont genomic sequences.
See research article http://www.biomedcentral.com/1741-7007/9/87
- Feat Group
- Nucleotide Composition
- Compositional Heterogeneity
- Multiple Substitution
- Correct Topology
Husník and co-workers  address the question of the evolution of endosymbiosis in insects by applying a phylogenomic approach - the use of complete genomes to infer phylogenetic relationships. A naïve opinion is that phylogenomics will end incongruence in phylogeny, and therefore that gathering more data will suffice to resolve outstanding phylogenetic questions. However, while the use of many genes does reduce stochastic errors (due to improved sample size), it simultaneously makes systematic errors more apparent . Systematic errors are due to the limitations of tree inference methods, which do not sufficiently account for the complexity of the genomic data. As such, systematic errors will lead to more and more biased results as the amount of data increases, thus producing highly supported, yet erroneous, phylogenomic trees.
the use of a large number of species, naturally easing the detection of multiple substitutions,
the use of complex models of sequence evolution (especially by accounting for heterogeneity across sites and over time), allowing more accurate detection of multiple substitutions,
the removal of the fastest evolving sites, which are obviously the most prone to exhibit multiple substitutions.
Inferring the origin of endosymbionts is typically difficult for phylogenomics. Their intracellular lifestyle introduces similar biases in independent endosymbiotic organisms, with such convergences leading to potentially erroneous grouping of unrelated species. More precisely, because of their small effective population size, endosymbionts are subject to an irreversible accumulation of deleterious mutations, known as Muller's ratchet, thereby evolving at an accelerated rate. These accelerations may lead to the well-known long branch attraction artefact, in which the longest branches of a phylogenetic tree are clustered together irrespective of their true relationships. Moreover, due to this inefficient purifying selection, endosymbionts are more sensitive to mutational bias, with their genomes becoming more A+T rich. The erroneous grouping of species with similar nucleotide composition is also a frequent artefact (e.g. ).
Husník et al.  took many precautions to reduce the effect of these two biases that favour, potentially erroneously, the clustering of endosymbionts. First, they selected all the genes that are single copy in the 50 complete genome sequences of γ-Proteobacteria, hence avoiding identification problems caused by multi-copy gene families. Second, they used as many enterobacterial species as are currently available, although they could have used more outgroup species. These trivial, sometimes neglected, steps lead to a large dataset of 69 genes (63,462 nucleotidic sites, or 21,154 amino acid sites). Not surprisingly, a naïve phylogeny based on nucleotides and assuming compositional homogeneity over time leads to the grouping of the fast evolving, A+T-rich, endosymbionts (named hereafter the FEAT group), with high statistical support. Although this topology is certainly partly incorrect (for example, the inclusion of two species with the highest AT content, Riesia and Wigglesworthia, within the genus Buchnera), the monophyly of most endosymbionts might be correct, since it is possible for a bias to reinforce a true (but unknown) phylogenetic signal.
Given the impossibility of experimental validation in what is fundamentally an historical science, corroboration is the most efficient support of an inference . In general, phylogenomicists look for congruence among independent sets of characters (for example, between primary sequences and gene content, gene order or intron positions). Alternatively, as done by Husník et al. , congruence on the same dataset among independent methods is also relevant, especially in the case of bacterial endosymbionts, for which other character types are non-existent or inadequate; for instance gene content is highly prone to convergence. Husník and colleagues  hence applied a variety of methods known to reduce artefacts due to compositional bias and/or long branch attraction. Importantly, the more accurate the method is, the fewer endosymbionts are grouped, which strongly argues for several independent endosymbioses.
The use of amino acid sequences is an effective way to reduce the misleading effect of nucleotide compositional heterogeneity, although some information is lost. The use of a standard site-homogeneous model leads to the exclusion of Regiella from the FEAT group, while the CAT+GTR model  that simultaneously handles heterogeneity in the evolutionary process across sites and among amino acid substitutions leads to the further exclusion of Ishikawaella. Since the CAT+GTR model fits the data better and is less sensitive to long branch attraction , this first result is in agreement with an artefactual nature of the FEAT group. As nucleotide heterogeneity may affect amino acid composition, Husník et al.  applied the Dayhoff recoding. This is a recoding of amino acids into the six main Dayhoff categories, such as grouping the positive amino acids arginine, histidine and lysine, and is known to reduce possible biases , again at the cost of information. Interestingly, in the resulting phylogeny, the insect endosymbionts explode into four monophyletic groups dispersed over the enterobacterial tree. The disaggregation of the FEAT group is similarly observed for the analysis of nucleotidic sequences after removal of third codon positions or RY-coding (purine/pyrimidine), and the use of an improved model of sequence evolution. In particular, the use of a non-homogeneous model , that is, a model that does not assume homogeneity of nucleotide composition over time, recovers a topology that is highly similar to the Dayhoff-recoded topology.
We recently obtained a similar result in the case of an animal phylogeny based on the mitochondrial genome. The removal of fast evolving sites has no effect, whereas the removal of heteropecillous sites, ones that change their substitution pattern over time, leads to the correct topology . These two failures of fast site removal can be easily explained. Models of sequence evolution handle rate heterogeneity across sites anyway, usually through a gamma distribution, so that fast evolving sites will be detected, and have a limited effect on topology inference. In contrast, a site that violates model assumptions such as non-homogeneity of nucleotide composition across species might still evolve slowly and seriously impact phylogenetic reconstruction. The study of Husník et al.  and our work  argue in favour of developing methods that specifically remove model-violating sites rather than fast evolving sites.
Corroboration is key to solving difficult phylogenetic questions. Instead of using independent markers (for instance, from mitochondrion, plastid and nucleus), Husník et al.  successfully used three independent approaches to demonstrate that at least four endosymbioses of Enterobacteria have occurred in the insect lineage. More generally, this study demonstrates that, in spite of overwhelming genomic data, more effort should be put into refining data analysis. Unfortunately, the two approaches that are the most beneficial to phylogenetic accuracy - more species and better models - both imply a drastic increase in computation time. In a time of global warming and biodiversity loss, it is also urgent that scientists strive to decrease the environmental footprint of their research activities. Individualism is one cause of current environmental problems. An increase in our knowledge about the commonness of the symbiosis and its evolutionary advantages (by low consuming experiments) could be a way to change our societal paradigms and solve environmental crisis. The evolutionary advantages of endosymbioses should not be ignored.
- Nowack EC, Melkonian M: Endosymbiotic associations within protists. Philos Trans R Soc Lond B Biol Sci. 2010, 365: 699-712. 10.1098/rstb.2009.0188.PubMed CentralView ArticlePubMedGoogle Scholar
- Husník F, Chrudimský T, Hypša V: Multiple origin of endosymbiosis within the Enterobacteriaceae (gamma-Proteobacteria): convergency of complex phylogenetic approaches. BMC Biology. 2011, 9: 87-10.1186/1741-7007-9-87.PubMed CentralView ArticlePubMedGoogle Scholar
- Philippe H, Brinkmann H, Lavrov DV, Littlewood DT, Manuel M, Worheide G, Baurain D: Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 2011, 9: e1000602-10.1371/journal.pbio.1000602.PubMed CentralView ArticlePubMedGoogle Scholar
- Herbeck JT, Degnan PH, Wernegreen JJ: Nonhomogeneous model of sequence evolution indicates independent origins of primary endosymbionts within the enterobacteriales (gamma-Proteobacteria). Mol Biol Evol. 2005, 22 (3): 520-532.View ArticlePubMedGoogle Scholar
- Miyamoto MM, Fitch WM: Testing species phylogenies and phylogenetic methods with congruence. Syst Biol. 1995, 44: 64-76.View ArticleGoogle Scholar
- Lartillot N, Philippe H: A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol. 2004, 21: 1095-1109. 10.1093/molbev/msh112.View ArticlePubMedGoogle Scholar
- Philippe H, Brinkmann H, Copley RR, Moroz LL, Nakano H, Poustka AJ, Wallberg A, Peterson KJ, Telford MJ: Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature. 2011, 470: 255-258. 10.1038/nature09676.PubMed CentralView ArticlePubMedGoogle Scholar
- Hrdy I, Hirt RP, Dolezal P, Bardonova L, Foster PG, Tachezy J, Embley TM: Trichomonas hydrogenosomes contain the NADH dehydrogenase module of mitochondrial complex I. Nature. 2004, 432: 618-622. 10.1038/nature03149.View ArticlePubMedGoogle Scholar
- Galtier N, Gouy M: Inferring phylogenies from DNA sequences of unequal base compositions. Proc Natl Acad Sci USA. 1995, 92: 11317-11321. 10.1073/pnas.92.24.11317.PubMed CentralView ArticlePubMedGoogle Scholar
- Roure B, Philippe H: Site-specific time heterogeneity of the substitution process and its impact on phylogenetic inference. BMC Evol Biol. 2011, 11: 17-10.1186/1471-2148-11-17.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.