new local sequence alignment - home di homes.di.unimi.it · 2017. 10. 9. · global vs local...
TRANSCRIPT
Global Alignment
Global Alignment problem: seeks similarities between two entire strings useful when the similarity between the strings extends over their entire
length
The score of an alignment between two substrings might be larger than the score of an alignment between the entireties of input strings
Local Alignment Problem
Example: homeobox genes
How to find the conserved area and ignore the areas that show little similarity?
Motivation: Many genes are composed of domains, which are subsequences that perform a particular function.
1981: Temple Smith and Michael Waterman proposed a modification of the global sequence alignment dynamic programming algorithm that solves the Local Alignment problem
Global vs local alignment
Global and local alignments of
two hypothetical genes that each
have a conserved domain.
Global vs local alignment
The local alignment has a much worse
score according to the global scoring
scheme, but it correctly locates the
conserved domain.
Local Alignment problem
Local Alignment
Inefficient approach: find the longest path between every pair of vertices, and then select the longest of these computed paths
Good approach: find the longest paths from the source (0,0) to every other vertex by adding edges of weight 0 in the edit graph
The Smith-Waterman local
alignment algorithm
introduces edges of weight 0
(dashed lines) from the
source vertex (0, 0) to every
other vertex in the edit graph
Local Alignment
The largest value of si,jover the whole edit graph
represents the score of the best local alignment of v and w
Recall: global alignment matrix
Local alignment
Local alignment
Initialize first row and first column to be 0
The score of the best local alignment is the largest value in the entire array
To find the actual local alignment: start at an entry with the maximum score
Trace-back as usual
stop when we reach an entry with a score of 0
Example 1
Example 2
Example 2
Other examples
Exercise
Given the two sequencess: AACCTATAGCT
t: GCGATATA
and the following score values:Gap penalty: -1
Match: +1
Mismatch: -1
compute a local sequence alignment of the input sequences.