文档库 最新最全的文档下载
当前位置:文档库 › Maximum-frequency gene tree a simplified genome-scale approach to overcoming incongruence i

Maximum-frequency gene tree a simplified genome-scale approach to overcoming incongruence i

Maximum-frequency gene tree a simplified genome-scale approach to overcoming incongruence i
Maximum-frequency gene tree a simplified genome-scale approach to overcoming incongruence i

ORIGINAL RESEARCH

Correspondence: Xiu-Qing Li, Ph.D., Research Scientist, Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, 850 Lincoln Road; P .O. Box 20280 Fredericton, NB E3B 4Z7, Canada. Tel: 506-452-4829; Fax: 506-452-3316; Email: lixq@agr.gc.ca or lixiuqing2008@https://www.wendangku.net/doc/426253852.html,

Copyright in this article, its metadata, and any supplementary data is held by its author or authors. It is published under the Creative Commons Attribution By licence. For further information go to: https://www.wendangku.net/doc/426253852.html,/licenses/by/3.0/.

Maximum Gene-Support Tree

Yunfeng Shan 1,2 and Xiu-Qing Li 1

1

Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada,850 Lincoln Rd, P.O. Box 20280, Fredericton, New Brunswick, E3B 4Z7, Canada. 2Department of Natural History, Royal Ontario Museum, Toronto, Ontario M5S 2C6, Canada.

Abstract: Genomes and genes diversify during evolution; however, it is unclear to what extent genes still retain the relationship among species. Model species for molecular phylogenetic studies include yeasts and viruses whose genomes were sequenced as well as plants that have the fossil-supported true phylogenetic trees available. In this study, we generated single gene trees of seven yeast species as well as single gene trees of nine baculovirus species using all the orthologous genes among the species compared. Homologous genes among seven known plants were used for validation of the ? nding. Four algorithms—maximum parsimony (MP), minimum evolution (ME), maximum likelihood (ML), and neighbor-joining (NJ)—were used. Trees were reconstructed before and after weighting the DNA and protein sequence lengths among genes. Rarely a gene can always generate the “true tree” by all the four algorithms. However, the most frequent gene tree, termed “maximum gene-support tree” (MGS tree, or WMGS tree for the weighted one), in yeasts, baculoviruses, or plants was consistently found to be the “true tree” among the species. The results provide insights into the overall degree of divergence of orthologous genes of the genomes analyzed and suggest the following: 1) The true tree relationship among the species studied is still maintained by the largest group of orthologous genes; 2) There are usually more orthologous genes with higher similarities between genetically closer species than between genetically more distant ones; and 3) The maximum gene-support tree re ? ects the phylogenetic relationship among species in comparison.Keywords: genome, gene evolution, molecular phylogeny, true tree

Introduction

Living organisms survive their environment through genetic variations such as transposition (McClintock, 1984), gene conversion (Archibald and Roger, 2002), horizontal gene transfer (Doolittle, 1999), adaptive selection (Logares et al. 2007), mutation or recombination (Vuli et al. 1999). This increasing genetic divergence of species makes it a challenge to reconstruct the true trees and to evaluate to what degrees the genes still retain their species relationship in taxa.

Various taxonomic groups such as some known plants have a well corroborated phylogeny or true tree that is based on combined support from fossil records and morphological characteristics (Russo et al. 1996). Such a set of organisms provides a reference for evaluating the reliability of molecular data-based alternative methods for determining phylogenetic relationships. Historically, determining the phylogeny of microbes was dif ? cult due to the lack of discernible morphological characters (Fitz-Gibbon and House, 1999). Molecular phylogenetics has made great progress in studying the evolutionary relations among taxa, although incongruence in the phylogenetic tree reconstruction occurs from the methods used and genes studied (Russo et al. 1996; Doolittle, 1999; Baldauf et al. 2000; Rokas et al. 2003; Philippe et al. 2005; Simpson et al. 2006). Recently, whole genome sequences of a number of species became available, and it has increased the possibility to reconstruct a true tree through genome scale phylogeny (Rokas et al. 2003; Philippe et al. 2005).

There are mainly two alternative approaches for reconstructing genome-scale phylogenetic trees. The ? rst is to concatenate many sequences head-to-tail into one and then reconstruct a tree (Kluge, 1989; Huelsenbeck et al. 1996; Yang, 1996; Rokas et al. 2003; Soltis et al. 2004). The second approach is to reconstruct many single-gene trees and then use the resulting trees to infer a majority rule consen-sus tree (Herniou et al. 2001; Gadagkar et al. 2005).

Yeasts and viruses are two important groups of model organisms for studying evolution and phylo-

genetics. The most accepted tree for representing the true tree of the seven yeast species was from the

Shan and Li

phylogenetic analysis of the concatenated sequence of 106 orthologous genes (Rokas et al. 2003). Similarly, the “true tree” of nine baculoviruses has been established from 63 shared gene sequences (Herniou et al. 2001).

Although fossil-based and molecular data-based phylogenetic analyses have been documented in various organisms, it is unknown to what extent the orthologous genes are divergent in term of tracing relations among species. In this study, gene by gene phylogenetic analysis of yeasts, baculovi-ruses, and plants con? rmed that the most frequent gene tree among species compared is actually the true tree. The method, or called “maximum gene-support tree” approach may provide a potential tree reconstruction method that overcome incongruence in molecular phylogenies.

Methods and Datasets

Source of sequence data sets

Three data sets were utilized. The ? rst data set contained 106 gene sequences from seven yeast species. These sequences have been previously analyzed using the genome-scale approach of concatenated alignment (Rokas et al. 2003). The yeast data set was retrieved from the Saccharo-myces genomes database (http://www.yeastgenome. org). S. bayanus, S. castellii, S. cerevisiae, S. kluyveri, S. kudriavzevii, S. mikatae, and S. paradoxus were included. The fungus Candida albicans was included as the outgroup species. The second data set included 63 shared gene sequences from nine completed baculovirus genomes, as described by Herniou et al. (2001). The third data set contained 36 common homolo-gous gene sequences from seven higher green plants, established by BLASTN (v2.2.6) search with the highest BLASTN score hit (e-value ?0.0009) against NCBI nr/nt database using avail-able Ginkgo biloba genes one by one. These sequences were retrieved from GenBank. The plant species included two gymnosperms, Picea glauca and Pinus taeda; two monocots, Oryza sativa and Triticum aestivum; and two dicots, Populus tremula and Arabidopsis thaliana. Ginkgo biloba was speci? ed as the outgroup species. These species were selected because their phylog-eny is well corroborated by the fossil record and morphological characters (Cronquist, 1981; Panchen, 1992; Campbell, 1993).Phylogenetic analysis

For the yeast and plant data sets, individual gene sequences were aligned using ClustalX with default settings (Thompson et al. 1997). All gene alignments were manually edited to exclude inser-tions or deletions and uncertain positions from further analysis. The phylogenetic analysis soft-ware PAUP* (Version 4.0b10) (Swofford, 2002) was used for tree inference based on four methods: MP, ME, NJ, and ML. Each nucleotide data set was analyzed under the optimality criteria of maximum parsimony for MP, distance for ME and NJ, and maximum likelihood for ML. The MP analyses were performed with unweighted parsi-mony. The ME, NJ and ML analyses were per-formed assuming the HKY85 model of nucleotide substitution. For the NJ analysis on amino acids, the absolute difference was used. The bootstrap consensus tree was searched using the branch-and-bound algorithm for MP and ML on nucleotides, and the full heuristic search was used for ME and NJ based on a 50% majority rule. 1000 replicates were used for all tests except for the ML, where 100 replicates were completed. Random sampling of genes was performed using a random number generator. For the baculovirus data set, only the phylogenetic trees obtained by Herniou et al. (2001) with the MP method were used.

The maximum gene-support tree approach

From the yeast data set, bootstrap consensus trees were recovered using all 106 individual genes with seven combinations of four methods (ME, ML, MP and NJ) for nucleotides or three methods (ME, MP, NJ) for deduced amino acids. Tree distances for all pairwise comparisons among trees were calculated using the symmetric difference metric by PAUP* (Swofford, 2002) and PHYLIP (Felsenstein, 1989). This is the number of steps required to convert between two trees, that is, the number of branches that differ between a pair of trees (Robinson and Foulds, 1981). Two trees with identical topology have a tree distance of zero. For the baculoviruses, the comparison of topologies between the MP trees using the Shimoaira-Hasegawa (SH) test were directly cited from Herniou et al. (2001). For the plant data set, comparisons between the trees were performed manually.

The index of gene-support is the number of genes that support a certain topology. The resulting

Maximum gene-support tree

numbers of genes were calculated for all unique trees from the results of each method. A maximum gene-support tree was de? ned as a unique tree that was recovered by the highest number of genes among all the trees generated. The statistics analy-ses were performed using the SAS system for Windows V8.

Re-sampling for subsets of genes Subsets of genes were randomly re-sampled using a random number generator. Ten replicates were used for each initial number of re-sampled genes. Precision was de? ned as the percentage of the number of congruent trees divided by the total number of trees. 100% precision was used as a criterion to determine the minimum number of genes required to overcome incongruence.

An executable program in C language for cal-culating frequencies of unique trees from tree distance data is available from the authors upon request (shan@cs.dal.ca; lixq@agr.gc.ca).

Results

Incongruence among different individual-gene phylogenies

Wide incongruence was observed among indi-vidual gene trees. The 106 individual genes inferred 20 to 51 unique trees for the 7 yeast spe-cies using 7 combinations of 4 phylogenetic methods with nucleotides or 3 methods with deduced amino acids (Table 1). Nucleotides inferred fewer unique trees (20–38) than amino acids (40–51) (Table 1). For example, the occurrence of the maximum gene-support tree for MP based on nucleotides was 37, while that based on amino acids was only 14 of 106 genes. Trees recovered from amino acids had more incongruence and less gene-support than those from nucleotides.

The maximum gene-support tree

The maximum gene-support trees for the seven yeast species from different methods (MP, ME, ML, NJ) based on both nucleotide sequences and amino acid sequences were all identical (Fig. 1). The maximum gene-supports of the unique trees recovered by 106 genes were 37, 33, 25, 28 for MP, ME, NJ, and ML on nucleotides, respec-tively, and 14, 14, and 17 for MP, ME, and NJ on amino acids, respectively (Table 1). The maximum gene-support percentages were 35%, 31%, 24%, 26% for MP, ME, NJ, ML on nucle-otides, respectively, and 13%, 14%, and 16% for MP, ME, and NJ on amino acids, respectively. The second most gene-support percentages were considerably smaller, 9%, 15%, 22%, 9% for MP, ME, NJ, ML on nucleotides, respectively, and 10%, 9%, and 8% for MP, ME, and NJ on amino acids, respectively (Table 1). Gene-support is de? ned as the number of genes that infer the same unique tree. The occurrence of maximum gene-support trees for nucleotides consistently had greater gene-support values than those for amino acids.

Table 1. Maximum gene-support (MGS), weighted maximum gene-support (WMGS), the second highest gene-support (2nd HGS), weighted second highest gene-support (2nd WHGS), number of unique trees (NUT), and threshold gene number (TGN) required to overcome incongruence based on a data set of 106 genes from seven yeast species*.

MGS WMGS 2nd HGS 2nd WHGS NUT TGN Nucleotides

MP 37(35%)

42(40%)

10(9%)

13(12%)

31

15 ME 33(31%) 32(30%) 16(15%) 17(16%) 23 26

NJ 25(24%) 25(23%) 23(22%) 23(22%) 20 106 ML 28(26%)

34(32%)

9(9%)

12(11%)

38

25 Amino acids

MP 14(13%)

18(17%)

11(10%)

9(9%)

51

55 ME 14(14%)

14(14%)

10(9%)

10(9%)

40

50

NJ 17(16%)

17(16%)

8(8%)

10(9%)

40

50

*Gene-support: number of genes that infer a unique tree; Gene-support percentage in parenthesis: the percentage of a gene-support divided by total genes; Number of unique trees: number of unique trees inferred from 106 genes; Threshold gene number: the minimum number of genes required for overcoming incongruence.

Shan and Li

Gene length and tree distance

A signi ? cant negative correlation between gene length and symmetric distance of a tree from the maximum gene-support tree was observed (Fig. 3). The greater the gene length, the shorter symmetric distance the tree was to the maximum gene-support tree.

The weighted maximum gene-support tree

Because sequence length is an important factor affecting single gene tree inference, adjustments were conducted by means of a weight factor, which is equal to the gene actual length divided by the average length of all the genes. The average sequence length of 106 genes was 1198 bps. The weight factors of the 106 genes were distributed between 0.33 and 2.50 (Fig. 2). For example, if the weight factor of gene A is 0.33, the value it contributes to the weighted maximum gene support is 0.33. No evident differences between the weighted and the unweighted maximum gene-supports were observed in any of the seven combinations in this study (Table 1). The weighted maximum gene-support tree was also consistent with the maximum gene-support tree (Fig. 1).

Gene-support and tree distance

There was a signi ? cant correlation between gene-support and symmetric distance of a tree from the maximum gene-support tree (Fig. 4). The greater the gene-support for a tree, the closer the tree was to the maximum gene-support tree. The topologies of the second gene-support trees were very similar

to the maximum gene-support tree. Generally, only one or two steps were required to convert between the two trees.

The minimum number of genes required to overcome incongruence

The precision of MP trees based on nucleotide sequences inferred from 5, 10, 15 or 20 genes was 80%, 90%, 100% or 100%, respectively. Therefore, at least 15 genes were required to overcome incon-gruence for the seven yeast species studied. For other methods, the minimum number of genes was 26, 106, 25 for ME, NJ, ML, respectively, based on nucleotides and 55, 50, and 50 for MP, ME, and NJ, respectively, based on amino acids (Table 1). Rokas et al. (2003) found that the number of genes suf ? cient to support all branches of the species tree was 20 based on the concatenated alignments of 106 genes from the same seven yeast species. The number varied with methods and taxa.

The minimum size (number of genes) in the dataset required for generating a MGS tree generally decreased with increased maximum gene-support percentages (Table 1). This number depended not only on the maximum gene-support, but also on the second highest gene-support. The closer the two values were, the more dif ? cult it was to identify the congruent tree. This is illustrated by the NJ method using nucleotides, where the maximum gene-support was 25 and the second highest gene-support was 23 (Table 1). In this case, the minimum required number of genes was 106 because the two trees were very similar and differed by only one branch. In contrast, for the NJ method on amino acids, the minimum number of genes was only 30 when the maximum gene-support was 18 and the second highest gene-support percentage was 8.

The maximum gene-support, the second highest gene-support, and the gap between them expanded when more genes were involved although the maximum gene-support percentages and the second highest gene-support percentages did not increase (Table 2). At the same time, precision increased. Therefore, higher con ? dence is obtained when more genes are involved.

Validation using data sets of other taxa

Using 63 shared genes from nine complete baculovirus genomes, the maximum gene-support

Figure 1. The rooted tree with the maximum gene-support inferred from 106 genes of seven yeast species. The outgroup in the analysis was C. albicans . The single gene trees were recovered using boot-strap consensus with a 50% majority rule.

C. albicans

S. kluyveri S. castellii S. bayanus S. kudriavzevii S. mikatae S. paradoxus S. cerevisiae

Maximum gene-support tree

sweep the signal (Doyle, 1992; De Queiroz, 1993; Miyamoto and Fitch, 1995). The maximum gene-support tree approach does not have this kind of problem or systematic error accumulation because each gene tree is separately reconstructed and contributes equally.

It is well known that the longer sequences of single genes usually tend to reconstruct better trees than shorter sequences. We showed that there is a significant negative correlation between gene length and symmetric distance of a tree from the maximum gene-support tree (Fig. 3). In order to remove the effects of gene length, adjustment was performed by average length of all sampled genes. In the datasets analyzed, weighted maximum gene-support (WMGS trees) did not show any difference from the maximum-gene support trees (MGS trees). It is unclear whether this just happened to the three datasets used or because each gene is an entity for certain functions despite the length dif-ference. Since the sequence length effect is well known, the weighted maximum gene-support tree approach is recommended at this stage. Further research is required to determine whether the WMGS tree approach is biologically more sound than the MGS tree approach.

This maximum gene-support tree approach avoids repeating intensive computing of large data sets of genome-scale concatenated alignments. It took 19 days to complete the phylogenetic analysis using ML with concatenated alignments of just 36 genes of seven plant species on a PowerPC Macin-tosh computer with 1.2 GHz CPU. Other authors have previously commented on the computational limitations of the ML method using concatenated alignments (Wolf et al. 2002). The computation time for thousands of genes from higher eukaryotes would be even more unacceptable. In addition, when more gene sequences are involved, the con-catenated approach requires all the computing processes to be repeated, while the maximum gene-support tree approach simply requires the addition of new single trees of the new genes. However, the ML method can still be an effective and ef? cient method with the maximum gene-support tree approach by distributing computing tasks to avail-able PC computers since each tree is inferred by each gene independently. For the concatenation approach, a parallel version of ML is necessary, but this is not available in most laboratories.

When recovered trees include polytomies, a more logical approach would be to add all equally parsimonious trees recovered from a single gene to the total tree data set rather than ? rst calculating consensus trees for each individual gene. One gene may contribute two or more trees for these genes while another gene contributes a single tree. Adjustment may be conducted by a contribution factor. The gene contribution may be divided by the number of equally parsimonious trees.

Table 2. The number of sampled genes, the maximum gene-supports (MGS), the second highest gene-supports (2nd_HGS), the differences between MGS and 2nd_HGS (DMGS), the maximum gene-support percentages (MGSP), the second highest gene-support percentages (2nd_HGSP), the differences of MGSP and 2nd_HGSP (DMGSP), and precisions*.

Genes MGS2nd_HGS DMGS MGSP2nd HGSP DMGSP Precision

%%%%

5 1.6(0.5) 1.0(0)0.6(0.5)32.0(11.0)20.0(0)12.0(11.0)60

10 3.2(1.4) 1.4(0.5) 1.8(1.3)32.0(14.0)14.0(5.2)18.0(13.2)80

15 4.0(1.6) 2.4(0.7) 1.6(2.0)26.7(10.4)16.0(4.7)10.7(13.0)60

20 5.3(2.3) 2.5(0.5) 2.8(2.6)26.5(11.6)12.5(2.6)14.0(13.0)90

24 6.6(1.8) 2.7(0.7) 3.9(2.2)27.5(7.7)11.3(2.8)16.3(9.1)90

25 6.5(1.5) 2.9(0.6) 3.6(1.8)26.0(6.0)11.6(2.3)14.4(7.4)100 308.3(1.6) 3.1(0.7) 5.2(2.1)27.7(5.2)10.3(2.5)17.3(7.2)100 1062891926.48.517.9100

r0.91***0.86***0.78***?0.11?0.440.070.55

df6464646464646

*ML trees inferred from nucleotides (data not shown for other methods). Sample replicates: 10. Precision: the percentage of the number of congruent trees divided by the total number of trees.

Values in parenthesis are standard deviations of the values. ***: Signi? cant correlation at P ? 0.001 level. r: Correlation coef? cient. Df: Degree of freedom.

Shan and Li

When the gene number is small, the gene number for the maximum gene-support tree may be equal to that of the second-highest gene-support trees. As well, the gene support con ? -dence can be very low, such as when only 2 genes return the same tree. For this situation, it is evi-dent that the number of genes is too small to reach the minimum requirement for widely incongruent single gene trees. The solution is to involve more genes in the analysis (similar to increasing sample size in other investigations). As shown in Table 2, when only 5 genes were used, difference between maximum gene-support and the second highest gene-support was 0.6, thus the precision was 60%. The precision increased to 100% when 25 genes were used, and the gap between the maximum gene-support tree and the second highest

gene-support tree was 3.6 (Table 2). The increased gap and gene-support enhance the con ? dence for reconstructing a phylogenetic tree. Evidently, gene-support percentages did not increase when more genes were included (Table 2). The jackknife method is suitable for re-sampling individual genes in order to determine the preci-sion and to judge whether the required gene number threshold has been reached. If the maxi-mum gene-support is very close to the second highest gene-support, it is dif ? cult to identify the maximum gene-support tree, even though gene-support is rather large as shown by NJ on amino acids. One solution is still to include more genes. Since the tree distance between a maxi-mum gene-support tree and a second highest gene-support tree differs by just one or two branches, cross-validation with other maximum gene-support trees inferred by other methods may be an alternative feasible approach.Obtaining a suf ? cient number of shared genes may become dif ? cult, and even unrealistic if too many taxa are involved. More orthologous genes are likely required when more species are tested. However, when many small trees are recovered using a minimum number of shared genes by means of the maximum gene-support tree approach, a larger picture of evolutionary relation-ships can gradually be reconstructed using a divide and conquer strategy of overlapping and connect-ing many smaller trees (Sanderson et al. 1998; Semple and Steel 2000).

As shown in Table 1, the gene-supports and its percentages of the maximum gene-support trees on nucleotides were greater than those on amino acids. When nucleotide sequences were used, more genes reconstruct the same tree, which means that nucleotide sequences may be more suitable for inferring species phylogeny. This result supports the hypothesis (Ayala et al. 1996) that evolution is more regular at the nucleotide level than at the protein level and, thus, more dependable as a molecular clock.

This maximum gene-support tree approach is likely an appropriate method to assess the phyo-genetic relationship across certain range of taxa, as evident from the analysis of the three data sets (nine virus races, seven yeast species, a fungus, and seven botanically distant plants) in this study. The phylogenetic relationships have been previ-ously identi ? ed using fossil records and morpho-logical characteristics for these plants (Cronquist,

Figure 5. The rooted maximum gene-support tree based on 36 genes from seven plant species. G. biloba was speci ? ed as the outgroup.

Figure 6. Phylogenetic analyses of the concatenated alignments of 106 genes from seven yeast species. Numbers above branches are bootstrap values (ME on amino acids/NJ on amino acids/NJ on nucleotides).

P . taeda P . glauca T. aestivum O. sativa P . tremula A. thaliana G. biloba

C. albicans

S. kluyveri S. castellii S. bayanus S. kudriavzevii S. mikatae S. paradoxus S. cerevisiae 100/100/100100/58/100100/96/76

100/100/100100/100/100

Maximum gene-support tree

1981; Panchen, 1992; Campbell, 1993) is now with further congruent support from the maximum gene-support tree. More studies are still needed to establish the generality of using the maximum-gene-support tree model in phylogeny with a huge number of species when their genome sequences are available.

In a hypothetical scenario in which a million or more species are compared at the same time, a single gene’s polymorphism, particularly for the short sequence genes, may not be useful in distin-guishing all the species regardless of the degree of polymorphism the gene has in the population. In this scenario, it is unclear whether the maximum-gene-support tree is still a good representation of the true tree. To date, it is unlikely any of the phy-logenetic methods are prefect, because each of them has their advantages and disadvantages. The maximum gene-support tree approach has its strength in comparing relatively close species because the approach is based on biological phe-nomenon, observed in the present study, that there are usually more orthologous genes with higher similarities between genetically closer species than between genetically more distant ones.

Two conclusions can be drawn from the present study: 1) The true tree relationship among species within each database studied is still maintained by the largest group of orthologous genes, although genes are of great divergence among organisms; and 2) The maximum gene-support tree, at least when the taxonomic range and the species number are not too large, is likely an effective novel approach for phylogenetic analysis with various advantages compared to existing approaches in the genome-scale or the large-gene-number-scale phylogenetic analysis.

Acknowledgements

The authors are most grateful to Dr. Antonis Rokas, who kindly provided suggestions and the aligned yeast data sets and their trees. The authors also sincerely thank Dr. Richard Winterbottom and Dr. David De Koeyer for critical reading, com-ments, and useful discussion. The authors greatly appreciate helpful suggestions and comments of the two reviewers.

References

Archibald, J.M. and Roger, A.J. 2002. Gene conversion and the evolution of euryarchaeal chaperonins: A maximum likelihood-based method for detecting con? icting phylogenetic signals. J. Mol. Evol., 55:232–45.Ayala, F.J., Barrio, E. and Kwiatowski, J. 1996. Molecular clock or erratic evolution? A tale of two genes. Proc. Natl. Acad. Sci. U.S.A., 93:11729–34.

Baldauf, S.L., Roger, A.J., Wenk-Siefert, I. and Doolittle, W.F. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science, 290:972–7.

Campbell, N.A. 1993. Biology. The Benjamin/Cummings Publishing Com-pany Inc., Redwood City, California.

Cronquist, A. 1981. An Integrated System of Classi? cation of Flowering Plants. Columbia University Press, New York, NY.

De Queiroz, A. 1993. For consensus (sometimes). System. Biol., 42:368–72. Doolittle, W.F. 1999. Phylogenetic classi? cation and the universal tree.

Science, 284:2124–8.

Doyle, J.J. 1992. Gene trees and species trees molecular systematics as one-character taxonomy. System. Bot., 17:144–63. Felsenstein, J. 1989. PHYLIP—Phylogeny Inference Package (Version 3.2).

Cladistics, 5:164–6.

Fitz-Gibbon, S.T. and House, C.H. 1999. Whole genome-based phyloge-netic analysis of free-living microorganisms. Nucl. Acids Res., 27:4218–22.

Gadagkar, S.R., Rosenberg, M.S. and Kumar, S. 2005. Inferring species phy-logenies from multiple genes: Concatenated sequence tree versus con-

sensus gene tree. J. Exp. Zool. Part B: Mol. Dev. Evol., 304:64–74. Herniou, E.A., Luque, T., Chen, X., Vlak, J.M., Winstanley, D., Cory, J.S.

and O’Reilly, D.R. 2001. Use of whole genome sequence data to infer baculovirus phylogeny. J. Virol., 75:8117–26.

Huelsenbeck, J.P., Bull, J.J. and Cunningham, C.W. 1996. Combining data in phylogenetic analysis. Trends Ecol. Evol., 11:152–8.

Kluge, A.G. 1989. A concern for evidence, and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes) System. Zool., 38:7–25.

Logares, R., Rengefors, K., Kremp, A., Shalchian-Tabrizi, K., Boltovskoy, A., Tengs, T., Shurtleff, A. and Klaveness, D. 2007. Phenotypically different microalgal morphospecies with identical ribosomal DNA:

A case of rapid adaptive evolution? Microbial Ecol., 53:549–61. McClintock, B. 1984. The signi? cance of responses of the genome to chal-lenge. Science, 226:792–801.

Miyamoto, M.M. and Fitch, W.M. 1995. Testing species phylogenies and phylogenetic methods with congruence. System. Biol., 44:64–76. Panchen, A.L. 1992. Classi? cation, Evolution, and the Nature of Biology.

Cambridge Univ. Press, Cambridge.

Philippe, H., Delsuc, F., Brinkmann, H. and Lartillot, N. 2005. Phylogenom-ics. Ann. Rev. Ecol. Evol. System, 36:541–62.

Phillips, M.J., Delsuc, F. and Penny, D. 2004. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol., 21:1455–8. Rokas, A., Williams, B.I., King, N. and Carroll, S.B. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies.

Nature, 425:798–804.

Robinson, D.R. and Foulds, L.R. 1981. Comparison of phylogenetic trees.

Math. Biosci., 53:131–47.

Russo, C.A.M., Takezaki, N. and Nei, M. 1996. Ef? ciencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol., 13:525–36. Sanderson, M.J., Purvis, A. and Henze, C. 1998. Phylogenetic supertrees: Assembling the trees of life. Trends Ecol. Evol., 13:105–9. Semple, C. and Steel, M. 2000. A supertree method for rooted trees. Discr.

Appl. Math., 105:147–58.

Simpson, A.G.B., Inagaki, Y. and Roger, A.J. 2006. Comprehensive multi-gene phylogenies of excavate protists reveal the evolutionary posi-

tions of “primitive” eukaryotes. Mol. Biol. Evol., 23:615–25. Soltis, D.E., Albert, V.A., Savolainen, V., Hilu, K., Qiu, Y.L., Chase, M.W., Farris, J.S., Stefanovic, S., Rice, D.W., Palmer, J.D. and Soltis, P.S.

2004. Genome-scale data, angiosperm relationships, and ‘ending incongruence’: A cautionary tale in phylogenetics. Trends Plant Sci., 9:477–83.

Swofford, D.L. 2002. P AUP*. Phylogenetic Analysis Using Parsimony (*and other methods). Version 4.0b10. Sinauer Associates, Sunder-

land, Massachusetts.

Shan and Li

Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins,

D.G. 1997. The CLUSTAL X windows interface: Flexible strategies

for multiple sequence alignment aided by quality analysis tools. Nucl.

Acids Res., 25:4876–82.

Vuli, M., Lenski, R.E. and Radman, M. 1999. Mutation, recombination, and incipient speciation of bacteria in the laboratory. Evolution, 96:7348–51.Wiley, E.O., Siegel-Causey, D., Brooks, D.R. and Funk, V.A. 1991. The Compleat cladist: A primer of phylogenetic procedures. Univ. Kansas Press, Museum Nat. Hist., Special Publication no. 19:1–158. Wolf, Y.I., Rogozin, I.B., Grishin, N.V. and Koonin, E.V. 2002. Genome trees and the tree of life. Trends Genet., 18:472–9.

Yang, Z. 1996. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol., 42:587–96.

相关文档