ATGC: FastME

FastME 2.0: a comprehensive, accurate and fast distance-based phylogeny inference program.

Lefort V., Desper R., Gascuel O.

Molecular Biology and Evolution 32(10), 2798-800, 2015.

FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of NJ. FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange (NNI). The new 2.0 version also includes Subtree Pruning and Regrafting (SPR), while remaining as fast as NJ and providing a number of facilities: distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations.

FastME is now available on GitLab.

Running FastME

FastME is a software whose main task is to estimate phylogenies using distance methods from nucleotide or amino acid multiple sequences alignments (MSA). It provides a wide range of options that were designed to ease standard phylogenetic analyses. The main strengths of FastME lies the availability of several distance algorithms and optimization principles (OLS and Balanced Minimum Evolution, iterative taxon addition, NJ, UNJ, BioNJ) for tree estimation coupled with various options to search the space of phylogenetic tree topologies (NNIs, SPRs). It also provides a parallelized implementation of the non-parametric bootstrap method to evaluate branch supports.

You can use FastME with PHYLIP-like interface or with the command line.

More explanations are given in the FastME manual.

PHYLIP-like interface options

The interface always suggests the more relevant parameters.

I : Input data type (distance matrix or sequence alignment)
Type of data in the input file. It can be either DNA or amino-acid MSA in PHYLIP format, or a distance matrix in PHYLIP format. If the input data type is a MSA, the menu will update to display evolutionary model options required to compute a distance matrix. Type I to change settings.
E : DNA evolutionary model
FastME implements a wide range of substitution models for DNA : p-distance, RY symmetric, RY, JC69, K2P, F81, F84, TN93, LogDet. F84 is the default option and is recommended in most cases. Select the model by typing E.
E : Protein evolutionary model
FastME implements a wide range of substitution models for proteins : p-distance, F81-like, LG, WAG, JTT, Dayhoff, DCMut, CpREV, MtREV, RtREV, HIVb, HIVw and FLU. LG is the default option and is recommended in most cases. Select the model by typing E.
G : Gamma distributed rates across sites
Rates of evolution often vary from site to site. This heterogeneity is modelled using a gamma distribution. Type G to switch this option on or off. If switched on, this option will add the gamma shape parameter option to the menu.
A : Gamma rate variation parameter (alpha)
The shape of the gamma distribution determines the range of rate variation across sites. Small values, typically in the [0.1, 1.0] range, correspond to large variability. Larger values correspond to moderate to low rate heterogeneity. With distance methods, it is often preferable to use relatively large (biased upward) values of this parameter, and 1.0 is default. Type A to set this option.
R : Remove sites whith gaps
By default, FastME does pairwise deletion of gaps when computing the distance matrix. Every site containing gap can be removed by switching on this option. It must be used with caution, especially if the input MSA contains many gaps, as then very few sites are conserved and used to estimate pairwise distances. In such situation trimming first the input alignment with Gblocks or BMGE is preferable. Type R to switch this option on.
O : Output calculated distance matrix
By default, the distance matrix computed from MSA is not displayed. It can be written into an output file by switching on this option. Type O to switch this option on.
D : Number of datasets
If the input file contains several data sets, FastME can analyze each of them in a single run of the program. Type D to change settings.
M : Initial tree: build method
Algorithm used to compute a tree from a distance matrix. It can be iterative taxon addition (optimizing the BalME criterion for 'TaxAdd_BalME' or the OLSME criterion for 'TaxAdd_OLSME'), Neighbor Joining (NJ), the unweighted version of NJ (UNJ) or an improved version of NJ based on a simple model of sequence data (BioNJ). BioNJ is recommended with sequence alignments, and UNJ with non-sequence data (e.g. expression data, to obtain clusters rather than a phylogeny). The user can also input his own tree. This tree should be in Newick format. Type M to select among these initial tree methods.
N : NNI postprocessing
By default, FastME does not improve the initial tree topology. Select this option to use nearest-neighbor interchange (NNI) to explore the topologies space. It optimizes the balanced version of minimum evolution (BalME). Type N to set this option.
S : SPR postprocessing
FastME can also perform subtree pruning and regrafting (SPR) with BalME. It generally finds better tree topologies compared to NNI but tends to be slower. Type S to switch this option on.
B : Bootstrap: number of replicates
The support of the data for each internal branch of the phylogeny can be estimated using non-parametric bootstrap. By default, this option is switched off. Typing B switches on the bootstrap analysis. The user is then prompted for a number of bootstrap replicates. The largest this number the more precise the bootstrap supports are. However, for each bootstrap replicate a phylogeny is estimated. Hence, the time needed to analyze N bootstrap replicates corresponds to N-times the time spent on the analysis of the original data set. N = 100 is generally considered as a minimum number of replicates; as FastME is fast, we recommend using N = 1,000, except for the very large data sets.

Command line options

-i input data file, --input_data=input data file
The input data file contains MSA or a distance matrix(ces).
-u input user tree file, --user_tree=input user tree file
FastME may use an existing tree topology available in the input_user_tree_file which corresponds to the input dataset.
Multiple input trees in input_user_tree_file may be used providing there are as much datasets in input_data_file (see '-i' option) as input trees.
This tree should be in Newick format.
-o output tree file, --output_tree=output tree file
FastME will write the infered tree into the output tree file.
-O output matrix file, --output_matrix=output matrix file
Use this option if you want FastME to write the distance matrix computed from the input MSA in the output mattrix file.
-I output_info_file, --output_info=output_info_file
Use this option if you want FastME to write information about its execution in the output_info_file.
-B output_boot_file, --output_boot=output_boot_file
Use this option if you want FastME to write bootstrap pseudo-trees in the output_boot_file.
-a, --append
Use this option to append results to existing output files (if any).
By default output files will be overwritten.
-D datasets, --datasets=datasets
If the input file contains several data sets, FastME can analyze each of them in a single run of the program.
Default value is 1.
-m method, --method=method
Algorithm used to compute a tree from a distance matrix.
It can be iterative taxon addition (optimizing the BalME criterion for 'TaxAdd_BalME' or the OLSME criterion for 'TaxAdd_OLSME'), Neighbor Joining (NJ), the unweighted version of NJ (UNJ) or an improved version of NJ based on a simple model of sequence data (BioNJ).
BioNJ is recommended with sequence alignments, and UNJ with non-sequence data (e.g. expression data, to obtain clusters rather than a phylogeny).
-d[model], --dna=[model]
FastME implements a wide range of substitution models for DNA : p-distance, RY symmetric, RY, JC69, K2P, F81, F84, TN93, LogDet.
F84 is the default option and is recommended in most cases.
-p[model], --protein=[model]
FastME implements a wide range of substitution models for proteins : p-distance, F81-like, LG, WAG, JTT, Dayhoff, DCMut, CpREV, MtREV, RtREV, HIVb, HIVw and FLU.
LG is the default option and is recommended in most cases.
-g[alpha], --gamma=[alpha]
Rates of evolution often vary from site to site. This heterogeneity is modelled using a gamma distribution.
The shape of the gamma distribution ([alpha]) determines the range of rate variation across sites.
Small values, typically in the [0.1, 1.0] range, correspond to large variability. Larger values correspond to moderate to low rate heterogeneity.
With distance methods, it is often preferable to use relatively large (biased upward) values of [alpha], and 1.0 is default.
-e, --equilibrium
The equilibrium frequencies for DNA are always estimated using the nucleotide frequencies in the MSA.
For amino-acid sequences, the equilibrium frequencies are estimated using the frequencies defined by the substitution model.
Use this option if you whish to estimate the amino-acid equilibrium distribution using their frequencies in the MSA.
-r, --remove_gap
By default, FastME does pairwise deletion of gaps when computing the distance matrix.
Every site containing gap can be removed by switching on this option.
It must be used with caution, especially if the input MSA contains many gaps, as then very few sites are conserved and used to estimate pairwise distances.
In such situation trimming first the input alignment with Gblocks or BMGE is preferable.
-n[NNI], --nni=[NNI]
By default, FastME does not improve the initial tree topology.
Select this option to use nearest-neighbor interchange (NNI) to explore the topologies space.
The user can choose to optimize the balanced or ordinary least-square versions of minimum evolution (BalME/OLSME).
-s, --spr
FastME can also perform subtree pruning and regrafting (SPR) with BalME. It generally finds better tree topologies compared to NNI but tends to be slower.
-w branch, --branch_length=branch
The Minimum Evolution algorithms (balanced and OLS) implemented in FastME compute the tree topology and the branch lengths separately. Thus, it is required to define how FastME will compute the branch lengths. By default, FastME will compute the tree topology and the branch lengths within the same framework (i.e. balanced ME for the topology with balanced branch lengths or OLSME with OLS branch lengths). However, even if we recommend not to do so, it is possible for FastME to compute a balanced minimum evolution tree topology and to assign OLS branch lengths to that tree (and conversely). If the tree is computed by a NJ-like algorithm (NJ, UNJ or BioNJ), FastME can keep the inferred branch lengths ('none' value of the option) or assign balanced or OLS branch lengths. Note that this option is only available when not doing any (NNI or SPR) tree swapping improvement. Moreover, this option can be used to assign branch lengths to any user-defined input tree. The user may choose the branch value from: BalLS (default), OLS or none.
-b replicates, --bootstrap=replicates
The support of the data for each internal branch of the phylogeny can be estimated using non-parametric bootstrap. The largest the replicates number the more precise the bootstrap supports are. However, for each bootstrap replicate a phylogeny is estimated. Hence, the time needed to analyze N bootstrap replicates corresponds to N-times the time spent on the analysis of the original data set.
N = 100 is generally considered as a minimum number of replicates; as FastME is fast, we recommend using N = 1,000, except for the very large data sets.
-z seed, --seed=seed
Use this option to initialize randomization with seed value.
Only helpful when bootstrapping.
-c
Use this option if you want FastME to only compute distance matrix.
Only helpful when the input data file contains MSA.
-f number of digits
Use this option to set the number of digits after the dot to use on output..
Default precision is 12.
-T number_of_threads, nb_threads=number_of_threads
Use this option to set the number_of_threads to use.
This option is only available if FastME was compiled with the parallel flag.
Default number_of_threads is the number of available CPU cores.
-v value, --verbose=value
Sets the verbose level to value [0-3]. Default value is 0.
-V, --version
Prints the FastME version.
-h, --help
Display this usage.

For example, assuming that the matrix datatest file (downloaded from this web page, see above) is within /home/ directory, the command line :

fastme –i /home/datatest.txt –d 3

will construct three trees and write them (Newick format) into the outputtree file '/home/dataset.txt_fastme_tree.nwk'. A second file '/home/dataset.txt_fastme_stat.txt' is created, containing options selected by the user and some statistics (estimated tree length and number of NNIs performed).