Pairwise Sequence Alignment

Pairwise sequence alignment is the alignment of sequences. Pairwise alignment does not mean the alignment of two sequences it may be more than between two sequences. It gives the higher similarity regions and least regions of differences.

For example, there are two sequences:

Sequence A: CGGATCAT

Sequence B: CTTAACT

An alignment of A and B:

CGGATCA- – T   sequence A

C – – – TTAACT   sequence B

There are three types of pairwise sequence alignment

  • Dot matrix analysis
  • Dynamic Programming
  • Word or K-tuple method

Dot Matrix Analysis

  • This method clearly shows the similarities between the two closely relates sequences
  • There are two sequences A and B.The sequence A is written on the top  of the matrix and sequence B written vertically on the left side of the matrix
  • Now starting from sequence B see the character in the sequence A where the character of match A and B match put the dot there.
  • Continue to put the dots according to matches.
  • These dots give us a diagonal row of dots
  • The dots rather than diagonal shows the random matches

Dot Matrix Comparison

This matrix tells us about the similarities between the two closely related sequence.similarities between two sequencesThis diagonal shows the similarities between these sequences

Similarities mean no of characters(nucleotide) matches in both sequences. similarities show the relationship between organisms and their ancestors. It shows how much they are the same in their function and structure.

  • It shows the insertion or deletion that tells us about mutations. If there is a mutation in sequence the diagonal will shift. It is not possible to tell whether the shifted diagonal is due to insertion or deletion so we call it “indels”.

indels between sequences

  • It also tell us about “palindromic sequences”.

palindromic sequence

Palindromic sequences mean the sequences that remain same if we read it from left to right or right to left. If there are some perpendicular diagonal at the original diagonal it will show the palindromic sequences.

  • It also predicts gene duplications. Gene duplication gives the parallel diagonal in the matrix.

gene duplication

Advantages and Disadvantages of Dot Matrix

Pairwise Sequence Alignment Advantages

The position of dots tell us about the region of alignment.it gives all possible alignment or diagonals. Human brain and eyes are used in this method

Disadvantages of Pairwise Sequence Alignment

The major disadvantage of this method is that it does not give us optimal alignment.

Dynamic Programming Method

To get the optimal alignment we use dynamic programming method. It tells us about gaps that could be a mutation. These gaps can represent by “—“.

We use two methods in the dynamic programming method.

  1. Local Alignment

  2. Global Alignment

In local alignment, we use Smith-watermann method while in global alignment Needleman-wunch method is used. There is a little bit difference between these two methods. In needlemann-wunsch algorithm, there are three values as one value of diagonal, second for match or miss match and the third one is of gap penalty. While in smith-watermann algorithm we use four values instead of three. The fourth value that we use is zero. The advantage of this zero is that we replace this zero with any negative number in the matrix.  In S-W algorithm we move to top left from the maximum value present anywhere in the matrix.

The word or K-tuple Method

It is the heuristic method, give not optimal alignment but better than the dynamic programming. Actually, the dynamic programming method could not be used for large databases that’s why we prefer the K-tuple method when we search a single query along with a huge database or alignment.

K tuple means a string of k words. For example for nucleotide K=11 and for protein K=3.

K method is implemented in the FASTA and BLAST family. In FASTA to search a database, the specific length of words=k is defined by the user. It is not so fast but it is susceptible at a low value of k. In BLAST algorithms are used for specific queries and matches distantly related sequence.

Leave a Comment