ATGC: Phyml 3.0 benchmarks

PhyML 3.0 Benchmarks

CAT methods

Comparison between programs (RAxML and FastTree) using the CAT approximation (Stamatakis, 2006) of the Γ4 mixture model (Yang, 1993), and PhyML 3.0 using the full standard Γ4 model.

See DNA benchmark - See protein benchmark - Download DNA medium-size data sets - Download protein medium-size data sets

Distribution of relative computing times: for each of the 2 sets of alignments (50 DNA and 50 protein medium-size alignments) we measured the base-2 logarithm of the ratio of the computing time of the given method, and that of the fastest approach with the corresponding alignment. Thus, a log-ratio equals to X corresponds to a method being 2^X times slower than the fastest approach.

DNA	Av. LogLk rank	Delta>5	P-value<0.05	Av. RF distance
PhyML 3.0 NNI	3.21	33	6	0.2711
PhyML 3.0 SPR	1.36	1	0	0.0692
RAxML CAT	2.19	7	0	0.2058
FastTree reoptimized	3.24	35	5	0.2585

PROTEIN	Av. LogLk rank	Delta>5	P-value<0.05	Av. RF distance
PhyML 3.0 NNI	2.82	19	1	0.2144
PhyML 3.0 SPR	1.8	3	0	0.0641
RAxML CAT	2.06	2	1	0.108
FastTree reoptimized	3.31	26	2	0.2762

Comparison of log-likelihoods on 50 DNA and 50 protein medium-size data sets. The column ‘Av. LogLk rank’ gives the average log-likelihood ranks for the different methods. These ranks are corrected by taking into account information on tree topologies. ‘Delta>5’ gives the number of cases (among 50) for which the drops of log-likelihood between the method of interest and the highest log-likelihood for the corresponding data set is greater than 5. The column ‘p-value<0.05’ displays the number of cases for which the difference of log-likelihood when comparing the method of interest to the corresponding highest log-likelihood is statistically significant (SH test). The ‘Av. RF distance’ values are the average Robinson and Foulds topological distances between the trees estimated by the method of interest and the corresponding most likely trees (0 corresponds to identical trees, while 1 means that the two trees do not have any clade in common).

Programs

4 programs and options have been compared.

PhyML NNI
PhyML 3.0, optimizing the topology with both simultaneous NNIs (as in original PhyML algorithm) and refined NNIs with 5-edge-length optimization, and using a BioNJ starting tree, configured with the GTR model for DNA sequences, with WAG for proteins, and with 4 discrete gamma rate categories (alpha estimated from the data).
PhyML SPR
PhyML 3.0, optimizing the topology with SPR (and NNI) operations, and using a BioNJ starting tree, configured with the GTR model for DNA sequences, with WAG for proteins, and with 4 discrete gamma rate categories (alpha estimated from the data).
RaxML CAT
RAxML, using the GTRMIX model for DNA sequences, and PROTMIXWAG model for protein sequences. This option makes RAxML perform a topology search under CAT, and then evaluate the final tree under full Γ4 (shape parameter estimated from the data) such that it yields stable likelihood values and branch lengths.
FastTree reoptimized
FastTree, using the GTR model for DNA sequences, and WAG model for protein sequences. FastTree outputs a tree with approximate branch lengths and parameter estimates, and no usable tree likelihood value. Thus, we used PhyML 3.0 to optimize these numerical parameters and obtain the likelihood of the FastTree tree topology. The computing time required by PhyML to achieve this task was added to that of FastTree. Otherwise comparisons would be unfair between programs inferring a tree topology only, and those inferring both the topology and reliable numerical parameter values. Moreover, we believe that in most analyzes users are interested in the value of these parameters (e.g. transition/transversion ratio, alpha, etc).

Results

Resulting trees are compared regarding topology, log-likelihood and computing time.

Computing time ranks
The six methods are ranked for each of the alignments, based on the computing time. First rank contains methods with computing time ranging from the best (B) computing time to 1.25 X B (i.e. nearly best computing time). Remaining methods are ranked in the same way, until all methods are ranked. Ties are accounted for; e.g. if the first and second group contains 2 methods each, the ranks will be 1.5 ( (1+2)/2 ) and 3.5 ( (3+4)/2 ). To summarize these results, we provide the median and average ranks for all DNA and protein alignments.
Topology ranks
The six methods are ranked for each of the alignments using a similar principle, based on the tree likelihood. First rank contains all methods which find the same best topology. And so on. Moreover, we provide the median and average ranks for all DNA and protein alignments.
Robinson and Foulds distances
RF is the Robinson and Foulds (bipartition) distance between the best topology and the given topology.
Delta>5
Another variable of interest is the number of times a method fails to find a phylogeny which log-likelihood is close to the highest log-likelihood found by any of the methods being compared. We thus counted the number of data sets for which the log-likelihoods returned by a given method was smaller than the highest log-likelihood found on the corresponding alignments minus 5.0. While this boundary of 5.0 points of log-likelihood is arbitrary, we believe that it provides a simple and practical way to tell the methods apart at first sight.
SH tests
We used the Shimoidara-Hasegawa (SH) test to assess the statistical significance of the likelihood differences. Every result displays the P-value between its logLk and the logLk of the best result for the same data. As a summary, we provide the number of times each method is significatively worst than the best one.