AlignAlign: Software for optimally aligning alignments
Alert
The functionality of AlignAlign is now being provided by the software tool
Opal.
Because Opal is the active implementation of the algorithm for optimally aligning alignments,
AlignAlign is no longer being maintained.
Opal can optimally align two alignments, and is faster than the implementation in AlignAlign because of a number of more recent speedups to the algorithm. To align alignments with Opal, run the command:
java -jar Opal.x.jar --in alignment1.fasta --in2 alignment2.fasta
Description
The software AlignAlign
takes as input two alignments (of disjoint sequences)
and outputs one alignment (of all the sequences)
that contains both input alignments and has optimal score.
In the multiple sequence alignment that is output,
the columns of the two input alignments are optimally aligned
under the sum-of-pairs scoring function, correctly accounting for gaps.
This software is free for noncommercial use.
(The terms of use are below.)
AlignAlign implements the algorithm of Kececioglu and Starrett
from RECOMB 2004.
(The citation is below.)
This algorithm computes an optimal alignment of two multiple-sequence
alignments under the sum-of-pairs objective with linear gap-costs,
which is NP-complete.
(In the sum-of-pairs objective, the score of a multiple-sequence alignment
is the sum of the scores of the two-sequence alignments that it induces on all
pairs of sequences.
With linear gap costs, a run of x insertions or deletions
in a pairwise alignment costs ax + b,
where a and b are constants.)
Extensive experiments show that on biological data the algorithm is fast
and runs in quadratic time.
More precisely, the theoretical worst-case time for the algorithm is
O(5min(k,n) max(k,n)2)
for two alignments each with k sequences and n columns,
while in practice the observed running time is
O(k2n2).
AlignAlign implements two versions of the algorithm:
one optimized for space, the other optimized for time.
The space-optimized version computes an optimal alignment in space
linear in the number of input columns.
The time-optimized version computes an optimal alignment in quadratic
space, but runs an order of magnitude faster in practice.
Both can be invoked from AlignAlign. In addition,
AlignAlign supports non-uniform sequence-pair weights,
as well as terminal gap penalties that differ from the default penalties.
Example
Below is an example illustrating AlignAlign on a multiple alignment from the
PALI database of structural protein alignments.
The example shows part of an optimal alignment of two 10-sequence subalignments from the Kunitz STI
inhibitor dataset (accession number 409). The alignment was computed using optimal parameter
values (for both amino acid substitution scores and gap-open and -extension penalties)
obtained by the algorithm of Kececioglu and Kim (RECOMB 2006). The optimal alignment of
the two alignments (which costs 528533) improves on the score of the original PALI benchmark
(which costs 529280). Computing the alignment took 0.2 seconds on a 2.4 GHz Pentium processor
with 512 MB RAM.
... 1avwb_409 31 AA-PTGNE--R-CPLTVVQSRNELDKGIGTIISSP----Y---R-IRFIAEGHPLSLKFD-SFAVIMLCVG-I--P--TE-WSVV--E-D-LP 100 ...
... 1tie_409 31 LA-KTGEE--T-CPLTVVQSPNELSDGKPIRIESR----L---R-SAFIPDDDKVRIGFA-Y-A----PKCAP--S--PW-WTVV-------- 92 ...
... 1eyla_409 35 TA-KTGNE--P-CPLTVVRSPNEVSKGEPIRISSQ----F---L-SLFIPRGSLVALGFA-N-P----PSCAA--S--PW-WTVV----D-SP 99 ...
... SPR1_IPOBA/4 32 LA-HLDM--MSKCATDVIVSPNDLDNGDPITITPA----TADPE-STVVMASTYQTFRFN-I-ATNKL--CVN--N--VN-WGIQ--H-D-SA 104 ...
... KTI1_SOYBN/2 31 VD-STGKE--I-CPLTVVQSPNELDKGIGLVFTSP----L---H-ALFIAERYPLSIKFG-SFAVITLCAG-M--P--TE-WAIV----E-RE 99 ...
... ITRA_SOYBN/2 30 AA-PTGNE--R-CPLTVVQSRNELDKGIGTIISSP----Y---R-IRFIAEGHPLSLKFD-SFAVIMLCVG-I--P--TE-WSVV--E-D-LP 99 ...
... IT1A_PSOTE/2 31 AA-ATGTE--T-CPLTVVRSPNEVSVGEPLRISSQ----L---R-SGFIPDYSVVRIGFA-N-P----PKCAP--S--PW-WTVV--E-D-QP 96 ...
... IDE3_ERYCA/1 35 LA-KTGEE--T-CPLTVVQSPNELSDGKPIRIESR----L---R-SAFIPDDDKVRIGFA-Y-A----PKCAP--S--PW-WTVV--E-D-EQ 96 ...
... IECI_ERYVA/2 31 AA-RTGKE--T-CPLTVVQSPFEVSNGEPIRIASQ----F---L-STFIPDGSPYAIGFA-N-P----PSCAA--S--PW-WTVV----E-TS 95 ...
... ICW3_PSOTE/2 31 TA-KTGNE--P-CPLTVVRSPNEVSKGEPIRISSQ----F---L-SLFIPRGSLVALGFA-N-P----PSCAA--S--PW-WTVV----D-SP 95 ...
|| ||||| |||||||||||||||||||||||||| | | ||||||||||||||| | |||||||||| | || |||| | | ||
... 1avac_409 35 MA-PGHG--RH-CPLFVSQDPNGQHDGFPVRITPYGV--A---PSDKIIRLSTDVRISFR-A-Y---TTC-LQ--S--TE-WHID--S-ELAA 105 ...
... 1wba_409 31 VV-ATGNENPE-DPLSIVKST-RNI-MYATSISSEDKTPP---Q-PRNILENMRLKINFA-T-D---P--H-K--G--DV-WSVV----DFQP 99 ...
... WIN3_POPSP/3 25 VVNATIN--PI-CNSDVILST-GIE-GLPVTFSPV--INS---T-DGVIREGTLITVSFD-A--S---TCGMA--GVTPM-WKIG--F-NSTA 95 ...
... IAAS_ORYSA/4 31 MA-PRRV--IP-CPLLVAQETDERRKGFPVRFTPWGG--A---ASPRTLRVSTDVRIRFN-A--A---TLCVQ--S--TE-WHVG--D-EPLT 100 ...
... IAAS_HORVU/2 31 MA-PGHG--RH-CPLFVSQDPNGQHDGFPVRITPYGV--A---PSDKIIRLSTDVRISFR-A-Y---TTC-LQ--S--TE-WHID--S-ELAA 100 ...
... ASP_THECC/30 32 LGRATGQ---S-CPEIVVQRRSDLDNGTPVIFSNADS------K-DDVVRVSTDVNIEFV-P---IRDRLCST--S--TV-WRLD--NYDNSA 102 ...
... MIRA_RICDU/3 31 VS-ATTPNGTFVCPPRVVQTRKEVDHDRPLAFFPE-N--P---K-EDVVRVSTDLNINFS-A-F--MPCRWTS--S--TV-WRLDKYD-E-ST 104 ...
... IT2_PSOTE/2- 31 AA-KVGNE--D-CPLTVVKSLDENSNGEPIRIASR--L-----R-STFIPEYSLVNLGFA-D--P---PKCAP--S--PF-WTVV--K-D-QS 96 ...
... ITRY_ACACO/2 31 LA-KTGDE--S-CPLTVVQAQSETKRGLPAVIWTP----P---K-IAILTPGFYLNFEFQPR-D---LPACLQKYS--TLPWKV---E-G--- 98 ...
... ALB1_PSOTE/4 29 VV-ATGNENPE-DPLSIVKST-RNI-MYATSISSEDKTPP---Q-PRNILENMRLKINFA-T-D---P--H-K--G--DV-WSVV----DFQP 96 ...
Complete alignment available here
The input alignments and parameter values for this example are provided with the software
distribution below.
Distribution
The source code is in C, distributed as Unix
tar and
zip files.
All noteworthy uses of AlignAlign should cite the paper referenced below.
Acknowledgements
This research was supported by the US National Science Foundation
through a grant from the Biological Databases and Informatics Program,
Grant DBI-0317498, "Robust Tools for Biological Sequence Analysis."
17 May 2007 |
John Kececioglu |
Travis Wheeler |
alignalign@cs.arizona.edu |
alignalign.cs.arizona.edu
|