ProbCons

From HandWiki

ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT.[1][2]

Algorithm

The following describes the basic outline of the ProbCons algorithm.[3]

Step 1: Reliability of an alignment edge

For every pair of sequences compute the probability that letters xi and yi are paired in a* an alignment that is generated by the model.

P(xiyi|x,y)=defPr[xiyi in some a |x,y]=alignment a with xiyiPr[a|x,y]=alignment a𝟏{xiyia}Pr[a|x,y]

(Where 𝟏{xiyia} is equal to 1 if xi and yi are in the alignment and 0 otherwise.)

Step 2: Maximum expected accuracy

The accuracy of an alignment a* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence.

Calculate expected accuracy of each sequence:

EPr[a|x,y](acc(a*,a))=aPr[a|x,y]acc(a*,a)=1min(|x|,|y|)a𝟏{xiyia}Pr[a|x,y]=1min(|x|,|y|)xiyiP(xiyj|x,y)

This yields a maximum expected accuracy (MEA) alignment:

E(x,y)=argmaxa*EPr[a|x,y](acc(a*,a))

Step 3: Probabilistic Consistency Transformation

All pairs of sequences x,y from the set of all sequences 𝒮 are now re-estimated using all intermediate sequences z:

P(xiyi|x,y)=1|𝒮|z1k|z|P(xizi|x,z)P(ziyi|z,y)

This step can be iterated.

Step 4: Computation of guide tree

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.

Step 5: Compute MSA

Finally compute the MSA using progressive alignment or iterative alignment.

See also

References

  1. "PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment". Genome Research 15 (2): 330–340. 2005. doi:10.1101/gr.2821705. PMID 15687296. 
  2. Roshan, Usman (2014-01-01). "Multiple Sequence Alignment Using Probcons and Probalign". in Russell, David J (in English). Multiple Sequence Alignment Methods. Methods in Molecular Biology. 1079. Humana Press. pp. 147–153. doi:10.1007/978-1-62703-646-7_9. ISBN 9781627036450. 
  3. Lecture "Bioinformatics II" at University of Freiburg