PhyML 3.0 Benchmarks

Simulated data sets

Comparison of PhyML 3.0 tree search options and RAxML, using 100 simulated DNA data sets.

See benchmark - Download simulated DNA data sets - Download simulated true trees

Simulated data sets

Distribution of relative computing times: for the whole set of alignments (100 simulated DNA data) we measured the base-2 logarithm of the ratio of the computing time of the given method, and that of the fastest approach with the corresponding alignment. Thus, a log-ratio equals to X corresponds to a method being 2^X times slower than the fastest approach.

Av. LogLk rankDelta>5P-value<0.05Av. RF distance
PhyML 3.0 NNI3.59300.100
PhyML 3.0 SPR3.705000.100
PhyML 3.0 BEST3.075000.097
PhyML 3.0 RAND2.8000.097

Performance of tree searching algorithms on 100 simulated nucleotide alignments. The column ‘Av. LogLk rank’ gives the average log-likelihood ranks for the different methods. These ranks are corrected by taking into account information on tree topologies. ‘Delta>5’ gives the number of cases (among 50) for which the difference of log-likelihood between the method of interest and the highest log-likelihood for the corresponding data set is greater than 5. The column ‘p-value<0.05’ displays the number of cases for which the difference of log-likelihood when comparing the method of interest to the corresponding highest log-likelihood is statistically significant (SH test). Note that in this table the Robinson and Foulds distance measures the topological difference between true and inferred trees (instead of the difference between inferred and most likely trees, as for the other tables).

Data sets

The benchmark contains 100 simulated data sets of 40 sequences and 500 sites. Data sets have been generated by Seq-Gen along random trees, using GTR model,with parameters estimated from HIV data (Posada and Crandall, 2001): nucleotide frequencies fA = 0.40, fC = 0.20, fG = 0.22, fT = 0.18, four rate categories of gamma shape parameter 0.969, and rates of nucleotide changes r(AC) = 1.72, r(AG) = 5.03, r(AT) = 0.84, r(CG) = 0.91, r(CT) = 7.70, r(GT) = 1; (M. Anisimova and O. Gascuel, 2006).


All programs have been run on a cluster Intel(R) Xeon(R) CPU 5140 @ 2.33GHz, 24 computing nodes, with 8GB of RAM for one bi-dualcore unit. Times can be compared because we've only considered effective computing time for the CPU.


6 programs and options have been compared. All programs were configured with the GTR model for DNA sequences, with WAG for proteins, and with 4 discrete gamma rate categories (alpha estimated from the data).


Resulting trees are compared regarding topology, log-likelihood and computing time.