Accounting for Solvent Accessibility and Secondary Structure in Protein Phylogenetics is Highly Beneficial.

Le S.Q., Gascuel O.
Systematic Biology 2010 (in press)

Input data

Alignments are provided in PHYLYP sequential format, followed by DSSP secondary-structure and solvent-exposure annotations , using Stockholm format. PhyML-structure simplifies these annotations by:
  1. Secondary structure: E (extended or E in DSSP), H (helix or H in DSSP), and other structures S, T, B, G, I, C, ".", "X", or "?" in DSSP. PhyML-structure regards structures differing from E and H as other (O). X and ? correspond to unknown values and are dealt with using a mixture (with CONF/MIX) or LG (with CONF/LG and PART).
  2. Classifying the sites into 10 relative surface accessibility categories: [0-9X] where (0=0%-10%; ...; 9=90%-100%). PhyML-structure considers 0 as buried values and [1-9] as exposed values.
  3. Following Stockholm format, Secondary Structure and Surface Accessibility notations are coded by
         #=GR  SS        Secondary Structure    For protein [HGIEBTSCX]
         #=GR  SA        Surface Accessibility  [0-9X]  (0=0%-10%; ...; 9=90%-100%)
    
    In the provided alignments, we include secondary structure information, surface accessibility, and original solvent exposure values.
See example.

Model

Amino-acid based models : EX2 (default) | EX3 | EHO | EX_EHO |UL2 | UL3 | LG | WAG | JTT

(*) We could extract three rate categories of EX3 by cutting relative solvent exposure values as [0-.08], [0.08-0.36] and [0.36 –1] for S (slow), M (medium) and F (fase).
See Le S.Q, Lartillot N. Gascuel O. (2008).
Phylogenetic Mixture Models for Proteins,
Philosophical Transactions of the Royal Society B, Vol. 363 (1512), 3965-3976.

Mode

Running PhyML-structure


phyml-structure [command args]

Command options:

PHYLIP-Like interface

You can use phyml with no arguments, in this case change the value of a parameter by typing its corresponding character as shown on screen.

Examples

    ./PhyML-SS -i Ord0300_2hhi.STR -m EX2 -M PART -c 4 -a e