PhyD*: Fast NJ-like algorithms to deal with incomplete distance matrices.
Criscuolo A., Gascuel O.
BMC Bioinformatics. 2008, Mar 26;9:166.
paper if you use PhyD*.
All the simulated datasets we have used to assess PhyD*
can be downloaded from this web page. These datasets mimic phylogenomic, multi-gene data. They are composed of trees and gene collections. Trees are randomly generated. Each gene collection contains k
genes. Gene sequences have been generated by evolving random ancestral sequences along the trees.
All files are compressed. With Windows, simply use your favorite compression software to decompress them. With UNIX you have to use the following instruction:
gunzip directoryXXX.tar.gz ; tar -xvvf directoryXXX.tar ;
and the following directory will be created: directoryXXX/
PhyD* simulated datasets generated using Driskell et al. (2004) green plant data
The datasets have been generated using very similar approach as that we used to generate the SDM datasets
. The main difference is that, instead of randomly deleting taxa, we used the presence/absence pattern of taxa in Driskell et al. (2004) green plant dataset. Moreover, this study involved 69 taxa and k