AlignAlign: Software for optimally aligning alignments
The functionality of AlignAlign is now being provided by the software tool Opal. Because Opal is the active implementation of the algorithm for optimally aligning alignments, AlignAlign is no longer being maintained.
Opal can optimally align two alignments, and is faster than the implementation in AlignAlign because of a number of more recent speedups to the algorithm. To align alignments with Opal, run the command:
AlignAlign implements the algorithm of Kececioglu and Starrett from RECOMB 2004. (The citation is below.) This algorithm computes an optimal alignment of two multiple-sequence alignments under the sum-of-pairs objective with linear gap-costs, which is NP-complete. (In the sum-of-pairs objective, the score of a multiple-sequence alignment is the sum of the scores of the two-sequence alignments that it induces on all pairs of sequences. With linear gap costs, a run of x insertions or deletions in a pairwise alignment costs ax + b, where a and b are constants.) Extensive experiments show that on biological data the algorithm is fast and runs in quadratic time. More precisely, the theoretical worst-case time for the algorithm is O(5min(k,n) max(k,n)2) for two alignments each with k sequences and n columns, while in practice the observed running time is O(k2n2).
AlignAlign implements two versions of the algorithm: one optimized for space, the other optimized for time. The space-optimized version computes an optimal alignment in space linear in the number of input columns. The time-optimized version computes an optimal alignment in quadratic space, but runs an order of magnitude faster in practice. Both can be invoked from AlignAlign. In addition, AlignAlign supports non-uniform sequence-pair weights, as well as terminal gap penalties that differ from the default penalties.
Below is an example illustrating AlignAlign on a multiple alignment from the PALI database of structural protein alignments. The example shows part of an optimal alignment of two 10-sequence subalignments from the Kunitz STI inhibitor dataset (accession number 409). The alignment was computed using optimal parameter values (for both amino acid substitution scores and gap-open and -extension penalties) obtained by the algorithm of Kececioglu and Kim (RECOMB 2006). The optimal alignment of the two alignments (which costs 528533) improves on the score of the original PALI benchmark (which costs 529280). Computing the alignment took 0.2 seconds on a 2.4 GHz Pentium processor with 512 MB RAM.
Complete alignment available here
AcknowledgementsThis research was supported by the US National Science Foundation through a grant from the Biological Databases and Informatics Program, Grant DBI-0317498, "Robust Tools for Biological Sequence Analysis."
17 May 2007 | John Kececioglu | Travis Wheeler | firstname.lastname@example.org | alignalign.cs.arizona.edu