SSIMUL: Speciation SIgnal extraction from MULtigene families.
Gene trees are leaf-labeled trees inferred from molecular sequences. Because of gene duplication events arising in genomes, some species host several copies of the same gene, hence individual gene trees usually have several leaves labeled with identical species names. Dealing with such multi-labeled gene trees (MUL trees) is a substantial problem in phylogenomics, e.g. current supertree methods do not handle MUL trees, which restricts studies aimed at building the Tree of Life to a very small core of mono-copy genes. We propose to tackle this problem by mainly transforming a collection of MUL trees into a collection of trees, each containing single copies of labels. To achieve that aim, we provide several fast algorithmic building stones. First, we propose to separately preprocess each MUL tree in order to remove its redundant parts with respect to speciation events. For this purpose, we provide a linear tree isomorphism algorithm for MUL trees (this step is performed by the
isomorphism program). Second, when the speciation signal contained in a MUL tree is topologically coherent (this step is performed by the
autoCoherence program), we produce a single-copy gene tree to replace the MUL tree while preserving the information it contains on speciation events (this step is performed by the
PhySIC_on_MUL_trees program). As an alternative approach, we propose to extract from each MUL tree a set of subtrees, both coherent and free of duplication events (this step is performed by the
pruning program).
Download SSIMUL program
Click
here to download an archive containing:
- The binaries for MacOS 10.5/10.6 and Linux 64 bits
- README file
Related papers
-
From gene trees to species trees through a supertree approach.
Scornavacca C., Berry V., and Ranwez V.
In Adrian Horia Dediu, Armand-Mihai Ionescu, and Carlos Martín-Vide, editors
LATA, volume 5457 of Lecture Notes in Computer Science, pages 702–714. Springer, 2009.
-
Building species trees from larger parts of phylogenomic databases.
Scornavacca C., Berry V., and Ranwez V.
Information and Computation (accepted), 2010.