EuPathDomains: The Divergent Domain DataBase for Eukaryotic Pathogens
Release 2.0 (August 2013)
Original paper: EuPathDomains: The Divergent Domain DataBase for Eukaryotic Pathogens
Ghouila A., Terrapon N., Gascuel O., Guerfali F.Z., Laouini D., Maréchal É. and Bréhélin L.
Infection, genetics and evolution, 11(4):698-707 (2011).
Updates: Identification of divergent protein domains by combining HMM-HMM comparisons and co-occurrence detection
Ghouila A., Florent I., Guerfali F.Z., Terrapon N., Laouini D., Ben Yahia S., Gascuel O. and Bréhélin L.
EuPathDomains is an extended database of protein domains in completly sequenced eukaryotic pathogens from EuPathDB
The EuPathDomains database gathers known Interpro
domains occurrences and new Pfam
domain occurrences found by the CODD procedure [Terrapon et al., 2009]
. CODD improves the sensitivity of Pfam domain detection by exploiting the domain tendency to appear preferentially with a few other favorite domains in a protein. This property enables CODD to certify the presence of a divergent domain on the basis of the presence of another domain in the same protein.
The EuPathDomains database contains domains for Giardia lamblia
, Trypanosoma brucei
, three Leishmania
species, and five apicomplexan species including three Plasmodium
species, Toxoplasma gondii
and Cryptosporidium parvum
It can be queried by proteins, domains or Interpro entries (IDs or names), and by varying the confidence threshold (False Discovery Rate, FDR) on the new domains. You can also browse the entire database by organisms.
The 2.0 release of the database is based on Pfam 26.0 and the last HMMER version (HMMER3
). As HMMER3 only allows local alignment mode, predicted domain occurences can correspond either to complete domains or to domain fragments. A column labelled "fragmentation"
now indicates the percentage of domain length of each occurence (i.e. the ratio of the occurence length over the model length), and an additional filter allows users to restrict queries to a given fragmentation threshold.
For L. major
and P. falciparum
, additional predictions were obtained with a newly developed approach that uses HHsearch
instead of HMMER to get the initial protein domain predictions that are processed and filtered by the CODD procedure. HHsearch is based on HMM-HMM comparisons and is often more sensitive than HMMER for identifying divergent domain occurrences. This allows CODD to discover more domains and with better FDRs.