SDM: a Fast Distancebased Approach for (Super)Tree Building in Phylogenomics.
Criscuolo A., Berry V., Douzery E.J.P., Gascuel O. Systematic Biology. 2006 55(5):740755.
Please cite
THIS paper if you use SDM.
Running SDM
SDM uses a PHYLIPlike interface (Felsenstein, 1993). The user is asked for the name of the input file.
This
input file contains either a collection of
k distance matrices in PHYLIP format (lower triangular or square), or a collection of
k trees with branch lengths (rooted or unrooted, binary or nonbinary,
but non bootstraped) in NEWICK format. The value of
k must be given before the collection. Comments can be written inside the input file if the line begins with the '%' character.
Here is an example of
input file containing square distance matrices:
Here is an other
input file containing trees:
SDM outputs several files:

The distance supermatrix, called sdm_output, where missing entries (if any) are noted 99.0

The list of gene rates (the (1/α_{p}) values) as estimated by SDM, called sdm_rates

A table indicating the taxa covered by each gene, called sdm_tab; this file also indicates whether there is at least one distance measurement per taxon pair, in which case the distance supermatrix is complete and has no missing entries. In this case all tree building algorithms can be used to infer the supertree (e.g. FastME, recommended). Else, we recommend using MVR* (or BioNJ*) from our PhyD* package.
SDM also provides the deformed source matrices, when option (4) is checked (see below); the corresponding file is called
sdm_deformed_matrices.
To run
SDM on LINUX, use the command:
java jar SDM.jar
To run
SDM on WINDOWS, doubleclick on
win_SDM.bat
PHYLIPlike interface
A PHYLIPlike menu display the various options:
Options

D Method (SDM, SDM*, ACS97)?
The default option is full SDM. SDM* is a restricted version, which is faster than SDM but does not use all the flexibility of SDM (a_{ip} variables are forced to be zero). ACS97 implements Average Consensus Supertree method, as described in (Lapointe and Cucumel, 1997).

T Input (Matrices, Trees)?
Option T indicates if the source data are distance matrices or trees with branch lengths.

L Lowertriangular data matrix?
In the case where source data are distance matrices, the L option allows to indicate if they are in lowertriangular or square format.

W Matrix weight? or Tree weight?
SDM (and SDM*) allows a confidence value (weight) to be associated to each source matrix (tree). This value must be written inside the input file, just after and on the same line as the taxon number (with matrices), or on a separated line before each tree. For example:
The length of the sequences from which the data have been inferred is a relevant and statistically wellfounded weight. Default gives the same weight to every matrix (or tree).

S Weight matrices (or trees) using their size?
Option S allows to weight matrices (or trees) by the inverse of the taxon number, or by the inverse of the square of the taxon number. This weight is multiplied by the previous confidence value. This option can be used to compensate for the (too) low influence of matrices (or trees) with few taxa.

M Analyse multiple collections?
Option M allows to treat multiple collections of matrices (or trees) given one after the other in the input file.

0 Output format (Phylip, Trex)?
Just a few programs are able to build trees from incomplete distance matrices: FITCH (Felsenstein, 1997) from PHYLIP package, are TREX (Makarenkov, 2001), and all PhyD* algorithms. FITCH requires the subreplicate (Phylip) format. PhyD* also uses a Phylip format, but subreplicates are not mandatory as missing entries are written as 99.0. TREX format is special: missing entries are indicated by 99.0, and the taxa are implicitely numbered and their names are removed. SDM then outputs an extra file called taxa
With complete matrices a number of other programs can be used, e.g. FastME (Desper and Gascuel, 2002) that uses the Phylip square format (without subreplicates).
This option allows to select Phylip (standard) format, or TREX format.

1 Output supermatrix in subreplicate format?
Option 1 provides the output file in PHYLIP subreplicate format. This format associates a weigth of 0 to missing entries and a weigth of 1 to the existing entries. This is the format required by FITCH to deal with incomplete distance matrices.

2 Output supermatrix (Lowertriangular, Square)?
Option 2 defines the output format: lowertriangular or square.

3 Write out rates onto file?
Option 3 writes the list of gene rates (the (1/α_{p}) values) as estimated by SDM (or SDM*) in file sdm_rates.

4 Write out deformed matrices onto file?
Option 4 writes the deformed source matrices in sdm_deformed_matrices file.

5 Write out variances onto file?
Option 5 computes and writes the variance of each entry inside the supermatrix of distance in sdm_output_variance file. This variance matrix will be useful when running MVR*.