ATGC: PhyD*

PhyD*: Fast NJ-like algorithms to deal with incomplete distance matrices.

Criscuolo A., Gascuel O.
BMC Bioinformatics. 2008, Mar 26;9:166.

Please cite THIS paper if you use PhyD*.

Phylogenomic datasets

All the simulated datasets we have used to assess PhyD* can be downloaded from this web page. These datasets mimic phylogenomic, multi-gene data. They are composed of trees and gene collections. Trees are randomly generated. Each gene collection contains k genes. Gene sequences have been generated by evolving random ancestral sequences along the trees.
All files are compressed. With Windows, simply use your favorite compression software to decompress them. With UNIX you have to use the following instruction:

gunzip directoryXXX.tar.gz ; tar -xvvf directoryXXX.tar ;

and the following directory will be created: directoryXXX/.

PhyD* simulated datasets generated using Driskell et al. (2004) green plant data

The datasets have been generated using very similar approach as that we used to generate the SDM datasets. The main difference is that, instead of randomly deleting taxa, we used the presence/absence pattern of taxa in Driskell et al. (2004) green plant dataset. Moreover, this study involved 69 taxa and k=254 genes.

Model trees
A unique file containing 100 69-taxon, randomly generated model trees.
Taxon per gene presence/absence pattern
See also Figure 2B in Driskell et al. 2004. Science.
Gene collections (phylip format)
Hundred collections of 254 sequence alignments generated using the model trees and the presence/absence pattern.
Distance matrix collections (SDM format)
Hundred collections of 254 K2P distance matrices obtained from each gene data.
Tree collections (phylip format)
Tree collections (SDM format)
Hundred collections of 254 trees inferred by PHYML with K2P.
MRP binary matrices
MRP binary matrices inferred from the 100 above collections of trees.
SDM medium level supermatrices
SDM distance and variance supermatrices computed from the 100 above collections of distance matrices.
SDM high level supermatrices
SDM distance and variance supermatrices computed from the 100 above collections of trees.