Accounting for Solvent Accessibility and Secondary Structure in Protein Phylogenetics is Highly Beneficial.

Le S.Q., Gascuel O.
Systematic Biology, 59(3): 277-287, 2010

Input data

Alignments are provided in PHYLYP sequential format, followed by DSSP secondary-structure and solvent-exposure annotations , using Stockholm format. PhyML-structure simplifies these annotations by:
  1. Secondary structure: E (extended or E in DSSP), H (helix or H in DSSP), and other structures S, T, B, G, I, C, ".", "X", or "?" in DSSP. PhyML-structure regards structures differing from E and H as other (O). X and ? correspond to unknown values and are dealt with using a mixture (with CONF/MIX) or LG (with CONF/LG and PART).
  2. Classifying the sites into 10 relative surface accessibility categories: [0-9X] where (0=0%-10%; ...; 9=90%-100%). PhyML-structure considers 0 as buried values and [1-9] as exposed values.
  3. Following Stockholm format, Secondary Structure and Surface Accessibility notations are coded by
         #=GR  SS        Secondary Structure    For protein [HGIEBTSCX]
         #=GR  SA        Surface Accessibility  [0-9X]  (0=0%-10%; ...; 9=90%-100%)
    In the provided alignments, we include secondary structure information, surface accessibility, and original solvent exposure values.
See example.


Amino-acid based models : EX2 (default) | EX3 | EHO | EX_EHO |UL2 | UL3 | LG | WAG | JTT

(*) We could extract three rate categories of EX3 by cutting relative solvent exposure values as [0-.08], [0.08-0.36] and [0.36 –1] for S (slow), M (medium) and F (fase).
See Le S.Q, Lartillot N. Gascuel O. (2008).
Phylogenetic Mixture Models for Proteins,
Philosophical Transactions of the Royal Society B, Vol. 363 (1512), 3965-3976.


Running PhyML-structure

phyml-structure [command args]

Command options:

PHYLIP-Like interface

You can use phyml with no arguments, in this case change the value of a parameter by typing its corresponding character as shown on screen.


    ./PhyML-SS -i Ord0300_2hhi.STR -m EX2 -M PART -c 4 -a e