Bayesian phylogenetic software based on mixture models.
Lartillot N., Philippe H. Molecular Biology and Evolution. 2004 21(6):1095-1109.
papers if you use PhyloBayes.
PhyloBayes is a Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction using protein alignments. Compared to other phylogenetic MCMC samplers (e.g. MrBayes
), the main distinguishing feature of PhyloBayes is the underlying probabilistic model, CAT
It is particularly well suited for large multigene alignments, such as those used in phylogenomics.
The version 2.3 of phylobayes allows for divergence time estimation, posterior predictive analyses, including compositional homogeneity and saturation tests, data recoding (analogous to R/Y coding, but for amino-acids), and cross-validation. It also implements a more efficient tree searching MCMC algorithm. It is compatible with macOSX, and with Linux 32/64 bits.
Bayesian MCMC sampling
The number of components, the associated profiles, and the allocation of each site to one of the available components are all free variables of the model, and are sampled from their joint posterior distribution by MCMC, together with all the other usual parameters of a phylogenetic model (topology, branch-lengths, alpha-parameter, etc.). In this way, the model accounts for the fact that distinct sites of a protein may evolve under different biochemical constraints, while averaging over all sources of uncertainty.
Once a sample is obtained, it can be marginalized
over the parameters of interest. Most often, people are interested in the phylogenetic tree itself : here, PhyloBayes works pretty much like usual Bayesian softwares, and outputs a majority rule posterior consensus tree. But conversely, you might be interested in the site-specific biochemical specificities that have been captured by CAT
. In that case, you can read out the mean posterior site-specific rates and profiles, that are also directly available from the MCMC output. Note that, in contrast to usual sequence profiles, these site-specific features are corrected for the phylogenetic correlations between the sequences.
: Empirical profile mixture models for phylogenetic reconstruction.