ATGC: SDM

SDM: a Fast Distance-based Approach for (Super)Tree Building in Phylogenomics.

Criscuolo A., Berry V., Douzery E.J.P., Gascuel O. Systematic Biology. 2006 55(5):740-755.

Please cite THIS paper if you use SDM.

Phylogenomic datasets

All the simulated datasets we have used to assess SDM can be downloaded from this web page. These datasets mimic phylogenomic, multi-gene data. They are composed of trees and gene collections. Trees are randomly generated. Each gene collection contains k genes. Gene sequences have been generated by evolving random ancestral sequences along the trees.
All files are compressed. With Windows, simply use your favorite compression software to decompress them. With UNIX you have to use the following instruction:

gunzip directoryXXX.tar.gz ; tar -xvvf directoryXXX.tar ;

and the following directory will be created: directoryXXX/.

SDM datasets

See also PhyD* datasets.

Model trees
Ten files, corresponding to 10 numbers of genes (2 , 4 , 6 , ... , 20).
Each is called ModelTree_k, where k is the gene number, and contains 500 48-taxon, randomly generated trees.
Gene collections (nexus format) - 25% taxon deletion
Gene collections (nexus format) - 50% taxon deletion
Gene collections (nexus format) - 75% taxon deletion
Gene collections (phylip format) - 25% taxon deletion
Gene collections (phylip format) - 50% taxon deletion
Gene collections (phylip format) - 75% taxon deletion
Ten files each, corresponding to sequences generated using the model trees.
Each file contain 500 gene collections, and each collection contains k genes. Each gene contains a variable number of sequences (up to 48) corresponding to the taxa that have not been removed by random deletion.
Concatenated genes (phylip format) - 25% taxon deletion
Concatenated genes (phylip format) - 50% taxon deletion
Concatenated genes (phylip format) - 75% taxon deletion
Same as above, but genes are concatenated into a supermatrix of characters.
Distance matrix collections (SDM format) - 25% taxon deletion
Distance matrix collections (SDM format) - 50% taxon deletion
Distance matrix collections (SDM format) - 75% taxon deletion
Ten files each, corresponding to the K2P distance matrices estimated using every gene data separately.
These matrix collections form the standard input of SDM. Here is an example of SDM format with one collection of k=4 distance matrices:
Tree collections (nexus format) - 25% taxon deletion
Tree collections (nexus format) - 50% taxon deletion
Tree collections (nexus format) - 75% taxon deletion
Tree collections (phylip format) - 25% taxon deletion
Tree collections (phylip format) - 50% taxon deletion
Tree collections (phylip format) - 75% taxon deletion
Tree collections (SDM format) - 25% taxon deletion
Tree collections (SDM format) - 50% taxon deletion
Tree collections (SDM format) - 75% taxon deletion
Ten files each.
Trees inferred with PhyML from every gene data separately. These tree collections form the second type of input of SDM. They are first transformed into distance matrices using the path-lengths between leaves, and then dealt with by SDM.
Here is an example of SDM format with one collection of k=4 trees:
MRP binary matrices - 25% taxon deletion
MRP binary matrices - 50% taxon deletion
MRP binary matrices - 75% taxon deletion
Ten files each.
MRP binary matrices computed from the above collections of trees.
SDM medium level supermatrices - 25% taxon deletion
SDM medium level supermatrices - 50% taxon deletion
SDM medium level supermatrices - 75% taxon deletion
Ten files each.
SDM distance and variance supermatrices computed from the above collections of distance matrices.
SDM high level supermatrices - 25% taxon deletion
SDM high level supermatrices - 50% taxon deletion
SDM high level supermatrices - 75% taxon deletion
Ten files each.
SDM distance and variance supermatrices computed from the above collections of trees.