Motif is a software that searches exhaustively for several DNA-binding patterns, also called motifs, on whole genome sequences. These motifs are given as Position Weight Matrices (PWM), and this server incorporates all motifs extracted from the last version of the widely used JASPAR database. Using this server is easy: select the patterns you want to search, select the genome, select a threshold percentage of scores, run the search, and Motif will send you all the DNA sequences matching each pattern, their score, as well as all their genomic locations.
Instead of selecting known patterns from JASPAR, you may also want to search for your own pattern confidentially on our server. You may then enter your own matrix in the interactive box below, column by column. When launched, Motif searches for your pattern only and returns the corresponding result. Your pattern is neither stored in our pattern database, nor accessible in any way to other users.
Our goal is to provide an user friendly service for searching efficiently such patterns on complete genomes. Motif has several advantages over concurrent methods:
The usefulness of PWM representation of a pattern is to score the similarity (i.e. the resemblance) between the pattern and any DNA sequence having the same length as the matrix. Given a matrix of say 10 columns, one can score the similarity of any DNA sequence of length 10, also termed a 10-mer, to the pattern. The higher the score, the better the similarity. These scores are used for comparing words. It is difficult to interpret a score value by itself.
When using Motif, you select how many of the most similar words Motif will search for as a percentage of all possible words. For instance, the default threshold of 90% means, search for the words whose score is greater than or equal to 90% of the best possible score for the pattern. This way of choosing a threshold is a more intuitive than selecting a minimum score value.
When matrix search on a genome, a homology percentage is given, more is tall and more, searched sequences contain tall base in the picture.
Remark: in general, JASPAR recommends a threshold of 85%. The default is 90%. Value below 70% are generally not meaningful and are disabled here on this web server. If you need special search with lower threshold please contact us.
A matrix represents a set of sequences sharing similarity. Starting from a gap-free multiple alignment, a matrix records for each column of the alignment, the number of occurrences of each base in that column. The proportionally highest numbers indicate the preferred or most conserved nucleotides at this position.
A sequence logo is a graphical representation of the sequence conservation of nucleotides in a pattern/matrix. In each column, all nucleotides are represented with a size proportional to its relative frequency. The largest nucleotides are the more frequent. For example, if one nucleotide takes all the place in a column, it means that the conservation is maximal, and only one nucleotide is "allowed" at that position.
Learning how to use the Motif web server ? Take a look at this tutorial (90 seconds).
Motif can search for multiple patterns on a given genome. Thus the output is organised first by pattern, then by strand, and for each pattern and strand by matching words.
The output format contains:
> followed by the matrix name>Tcf12 ACAGCTGCTG 4.89141 chromosom-3:303 chromosom-3:1269 chromosom-4:123 CAGCAGCTGT 3.448715 (reverse) chromosom-2:7584 ACAGCTGTTG 5.66843 chromosom-1:528 chromosom-1:39871 chromosom-2:5814 chromosom-4:1552 >ZNF711 AGGCCTAG 4.82415 chromosom-2:1788 chromosom-2:25451 chromosom-3:44584 CTAGGCCT 3.448715 (reverse) chromosom-2:7584