PhyD*: Fast NJlike algorithms to deal with incomplete distance matrices.
Criscuolo A., Gascuel O.
BMC Bioinformatics. 2008, Mar 26;9:166.
Please cite
THIS paper if you use PhyD*.
Running PhyD*
PhyD* uses a PHYLIPlike interface (Felsenstein, 1993). The user is asked for the name of the input file. This file contains one (or several) distance matrix in PHYLIP format (lower triangular or square). A missing entry must be written
99.0. Binary values (
Subreplicate option) may be associated just after each matrix entry; the 0 value indicates a missing entry and the 1 value indicates a nonmissing entry. Comments can be written inside the input file if the line begins with the '%' character.
Here is an example of
input.d file containing square distance matrices:
The same three distance matrices with both lower triangular and subreplicate formats:
PhyD* outputs the phylogenetic tree(s) (in NEWICK format) inside a file called
output.t.
To run
PhyD* with LINUX, use the command:
java jar PhyDstar.jar
To run
PhyD* with WINDOWS, doubleclick on
win_PhyDstar.bat.
PHYLIPlike interface
A PHYLIPlike menu display the various options:
Options

D Method (NJ*, UNJ*, BioNJ*, MVR*)?
The four available algorithms NJ*, UNJ*, BioNJ* and MVR* are adaptations of NJ, UNJ, BioNJ and MVR, respectively. They all correspond to the initial algorithm when the input distance matrix is complete and when the P option is set to 1 (see below).
When MVR* is selected, the program uses the input distance matrix to compute the variance of the pairwise evolutionary distance estimates. However, the V option allows the user to select a file containing the variance matrix with the same format as the distance matrix file. Default algorithm is BioNJ*, which is both simple and fairly accurate with standard evolutionary distances. In a supertree context (i.e. when dealing with multiplegene datasets), we recommend: (1) to use SDM to compute the distances and their variances, and (2) to analyse the soobtained distance and variance matrices with MVR*. With unusual distance matrices, e.g. based on DNADNA hybridization or on morphological characters, UNJ* should be preferred.
The difference between these three algorithms lies in the variance model they use for the distance estimates. BioNJ* uses a model corresponding to onegene analysis. MVR* uses the SDM variances estimated by accounting (among others) for the length and the number of sequences. UNJ* is based on the ordinary leastsquare model, which can be seen as the null average model. NJ* is provided as well, but appeared as the worst approach in our simulation studies.

P Taxon pairs selected by NJlike filtering?
These four algorithms are based on several criteria to select the best taxon pair to be agglomerated at each step. Most of these criteria are time consuming. Thus, we first select a few pairs using a NJlike criterion, which is fast but moderately accurate, and then apply the other criteria to the selected pairs to find the best one. Our experiments showed that selecting 1020 taxons pairs is usually enough to obtain very good performance. Thus default is 15. Augmenting this value should not change much the output, but will be time consuming. Decreasing this value will accelerate the computations at the expense of a loss of accuracy.

N Negative branch lengths allowed?
This option sets all negative branch lengths to 0.

B Binary tree?
If the this option is set to No, all zero length branches are transformed to multifurcation in the output tree. This option should be combined with the previous one to collapse all negative and zero length branches.

O Outgroup root?
Default option of PhyD* is to output unrooted trees. When the O option is turned on, PhyD* prompts for the name of the species to be used to root the tree. Then, it returns a tree which is specified with a trifurcation at its base, and the root species is one of the elements of this trifurcation. In fact, when the O option is turned off, PhyD* uses the first taxon in the distance matrix to root the tree, just as all PHYLIP programs.

L Lowertriangular data matrix?
This option indicates that the distance matrix is input in Square or Lowertriangular form (the lowerleft half of the distance matrix only, without the zero diagonal elements). The default option is Lowertriangular.

S Subreplicate?
This option is to be used if the input distance matrix is in subreplicate format (see above).

M Analyse multiple matrices?
This option allows to treat multiple matrices given one after the other in the input file. Then, the output file provides the corresponding trees in the same order.