applied bioinformatics week 3. theory i similarity dot plot
TRANSCRIPT
![Page 1: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/1.jpg)
Applied Bioinformatics
Week 3
![Page 2: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/2.jpg)
Theory I
• Similarity
• Dot plot
![Page 3: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/3.jpg)
3
Introduction to Bioinformatics http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htmLECTURE 3: SEQUENCE ALIGNMENT
3.2 On sequence alignment
Sequence alignment is the most important task in bioinformatics!
![Page 4: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/4.jpg)
4
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htmLECTURE 3: SEQUENCE ALIGNMENT
3.2 On sequence alignment
Sequence alignment is important for:
* prediction of function* database searching* gene finding* sequence divergence* sequence assembly
![Page 5: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/5.jpg)
5
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htmLECTURE 3: SEQUENCE ALIGNMENT
3.3 On sequence similarity
Homology: genes that derive from a common ancestor-gene are called homologs
Orthologous genes are homologous genes in different organisms
Paralogous genes are homologous genes in one organism that derive from gene duplication
Gene duplication: one gene is duplicated in multiple copies that therefore free to evolve and assume new functions
![Page 6: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/6.jpg)
6
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htm
HOMOLOGOUS and PARALOGOUS
![Page 7: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/7.jpg)
7
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htm
HOMOLOGOUS and PARALOGOUS
![Page 8: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/8.jpg)
8
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htm
HOMOLOGOUS and PARALOGOUS versus ANALOGOUS
![Page 9: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/9.jpg)
? globin
plants
Ath-g
analogs
![Page 10: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/10.jpg)
10
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htmLECTURE 3: SEQUENCE ALIGNMENT: sequence similarity
Causes for sequence (dis)similarity
mutation: a nucleotide at a certain location is replaced by another nucleotide (e.g.: ATA → AGA)
insertion: at a certain location one new nucleotide is inserted inbetween two existing nucleotides (e.g.: AA → AGA)
deletion: at a certain location one existing nucleotide is deleted (e.g.: ACTG → AC-G)
indel: an insertion or a deletion
![Page 11: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/11.jpg)
Similarity
• We can only measure current similarity
• We can form hypothesi
![Page 12: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/12.jpg)
Similarity Searching
• DotPlot
• Needleman-Wunsch
• Smith-Waterman
• FASTA
• BLAST
![Page 13: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/13.jpg)
Dot Plot
• Writing one sequence horizontally
• Writing the other vertically
• At each intersection with equal nucleotides make a dot in the matrix
![Page 14: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/14.jpg)
Dot Plot
![Page 15: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/15.jpg)
Dot Plot
• Messy?
• Strong similarities can be visually enhanced
• Select a window size and a similarity score for that window (e.g. 10 and 8)
• Create a new matrix with dots where the window score >= 8
![Page 16: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/16.jpg)
Dot Plot
![Page 17: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/17.jpg)
Dot Plot Interpretation
![Page 18: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/18.jpg)
Creating a Dot Plot
![Page 19: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/19.jpg)
End Theory I
• Mindmapping
• 10 min break
![Page 20: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/20.jpg)
Practice I
• Dot plot
![Page 21: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/21.jpg)
Dot Plot
• ACGTGTGCGTTTGAAC• GGGTGTTCGTTTAAAC• Make a Dot plot for the two sequences above• Use a window of 3 to refine the view• Can you use Excel?
• Get any two DNA sequences and try the tool below– http://www.vivo.colostate.edu/molkit/dnadot/
![Page 22: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/22.jpg)
Similarity Searching
• DotPlot
• Needleman-Wunsch
• Smith-Waterman
• FASTA
• BLAST
![Page 23: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/23.jpg)
DefinitionsOptimal alignment - one that exhibits the
most correspondences. It is the alignment with the highest score. May or may not be biologically meaningful.
Global alignment - Needleman-Wunsch (1970) maximizes the number of matches between the sequences along the entire length of the sequences.
Local alignment - Smith-Waterman (1981) gives the highest scoring local match between two sequences.
![Page 24: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/24.jpg)
How can we find an optimal alignment?
• ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T-ATCT
• How many possible alignments?
C(27,7) gap positions = ~888,000 possibilities
• Dynamic programming: The Needleman & Wunsch algorithm
1 27
![Page 25: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/25.jpg)
Time Complexity
Consider two sequences:AAGT
AGTC
How many possible alignments the 2 sequences have?
2n2nnn = (2n)!/(n!)= (2n)!/(n!)2 2 = = (2(22n 2n //n ) = n ) = (2(2nn))
= 70
![Page 26: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/26.jpg)
Scoring a sequence alignment
• Match/mismatch score: +1/+0• Open/extension penalty: –2/–1ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT
• Matches: 18 × (+1)• Mismatches: 2 × 0• Open: 2 × (–2)• Extension: 5 × (–1)
Score = +9Score = +9
![Page 27: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/27.jpg)
Pairwise Global Alignment
• Computationally:
– Given:
a pair of sequences (strings of characters)
– Output:
an alignment that maximizes the similarity
![Page 28: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/28.jpg)
Needleman-Wunsch Alg
![Page 29: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/29.jpg)
Needleman-Wunsch Alg
![Page 30: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/30.jpg)
Needleman-Wunsch Alg
![Page 31: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/31.jpg)
Needleman-Wunsch Alg
![Page 32: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/32.jpg)
Needleman-Wunsch Alg
![Page 33: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/33.jpg)
Needleman-Wunsch Alg
![Page 34: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/34.jpg)
Needleman-Wunsch Alg
![Page 35: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/35.jpg)
Needleman-Wunsch Alg
![Page 36: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/36.jpg)
Needleman-Wunsch Alg
• Which Alignment is better?
• For scoring use:– Match 1– Mismatch 0– Gap open -2– Gap extension -1
• How can substitution matrices be integrated?
![Page 37: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/37.jpg)
Needleman & Wunsch
• Place each sequence along one axis• Place score 0 at the up-left corner• Fill in 1st row & column with gap penalty multiples• Fill in the matrix with max value of 3 possible moves:
– Vertical move: Score + gap penalty– Horizontal move: Score + gap penalty– Diagonal move: Score + match/mismatch score
• The optimal alignment score is in the lower-right corner• To reconstruct the optimal alignment, trace back where the
max at each step came from, stop when hit the origin.
![Page 38: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/38.jpg)
Three steps in Needleman-Wunsch Algorithm
• Initialization
• Scoring
• Trace back (Alignment)
• Consider the two DNA sequences to be globally aligned are:
ATCG (x=4, length of sequence 1)
TCG (y=3, length of sequence 2)
Pooja Anshul Saxena, University of Mississippi
![Page 39: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/39.jpg)
Scoring Scheme
• Match Score = +1
• Mismatch Score = -1
• Gap penalty = -1
• Substitution Matrix
A C G T
A 1 -1 -1 -1
C -1 1 -1 -1
G -1 -1 1 -1
T -1 -1 -1 1
Pooja Anshul Saxena, University of Mississippi
![Page 40: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/40.jpg)
Initialization Step
• Create a matrix with X +1 Rows and Y +1 Columns
• The 1st row and the 1st column of the score matrix are filled as multiple of gap penalty
T C G
0 -1 -2 -3
A -1
T -2
C -3
G -4Pooja Anshul Saxena, University of Mississippi
![Page 41: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/41.jpg)
Scoring• The score of any cell C(i, j) is the maximum of:
scorediag = C(i-1, j-1) + S(i, j) = -1
scoreup = C(i-1, j) + g = -2
scoreleft = C(i, j-1) + g = -2
where S(i, j) is the substitution score for letters i and j, and g is the gap penalty
T C G
0 -1 -2 -3
A -1 -1
T -2
C -3
G -4
Max -> C(i,j)
S(T,A)
![Page 42: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/42.jpg)
Scoring ….• Example:
The calculation for the cell C(2, 2):
scorediag = C(i-1, j-1) + S(I, j) = 0 + -1 = -1
scoreup = C(i-1, j) + g = -1 + -1 = -2
scoreleft = C(i, j-1) + g = -1 + -1 = -2T C G
0 -1 -2 -3
A -1 -1
T -2
C -3
G -4
Pooja Anshul Saxena, University of Mississippi
![Page 43: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/43.jpg)
Scoring ….
• Final Scoring Matrix
T C G
0 -1 -2 -3
A -1 -1 -2 -3
T -2 0 -1 -2
C -3 -1 1 0
G -4 -2 0 2
Pooja Anshul Saxena, University of Mississippi
![Page 44: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/44.jpg)
Trace back
• The trace back step determines the actual alignment(s) that result in the maximum score
• There are likely to be multiple maximal alignments
• Trace back starts from the last cell, i.e. position X, Y in the matrix
• Gives alignment in reverse order
Pooja Anshul Saxena, University of Mississippi
![Page 45: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/45.jpg)
Trace back ….
• There are three possible moves: diagonally (toward the top-left corner of the matrix), up, or left
• Trace back takes the current cell and looks to the neighbor cells that could be direct predecessors. This means it looks to the neighbor to the left (gap in sequence #2), the diagonal neighbor (match/mismatch), and the neighbor above it (gap in sequence #1). The algorithm for trace back chooses as the next cell in the sequence one of the possible predecessors
Pooja Anshul Saxena, University of Mississippi
![Page 46: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/46.jpg)
Trace back ….
• The only possible predecessor is the diagonal match/mismatch neighbor. If more than one possible predecessor exists, any can be chosen. This gives us a current alignment of
Seq 1: G |
Seq 2: G
T C G
0 -1 -2 -3
A -1 -1 -2 -3
T -2 0 -1 -2
C -3 -1 1 0
G -4 -2 0 2
Pooja Anshul Saxena, University of Mississippi
![Page 47: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/47.jpg)
Trace back ….• Final Trace back
Best Alignment:A T C G | | | |_ T C G
T C G
0 -1 -2 -3
A -1 -1 -2 -3
T -2 0 -1 -2
C -3 -1 1 0
G -4 -2 0 2
Pooja Anshul Saxena, University of Mississippi
![Page 48: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/48.jpg)
Similarity Searching
• DotPlot
• Needleman-Wunsch
• Smith-Waterman
• FASTA
• BLAST
![Page 49: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/49.jpg)
Local Alignment
• Problem first formulated:– Smith and Waterman (1981)
• Problem:– Find an optimal alignment
between a substring of s and a substring of t
• Algorithm:– is a variant of the basic algorithm
for global alignment
![Page 50: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/50.jpg)
Motivation• Searching for unknown domains or motifs
within proteins from different families– Proteins encoded from Homeobox genes (only
conserved in 1 region called Homeo domain – 60 amino acids long)
– Identifying active sites of enzymes
• Comparing long stretches of anonymous DNA• Querying databases where query word much
smaller than sequences in database• Analyzing repeated elements within a single
sequence
![Page 51: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/51.jpg)
Smith-Waterman Alg
• Very similar to Needleman-Wunsch
• Determines local instead of global alignment
• Scores can drop and increase
• Alignments are calculated between 0 and 0 scores
![Page 52: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/52.jpg)
Three steps in Smith-Waterman Algorithm
• Initialization
• Scoring
• Trace back (Alignment)
• Consider the two DNA sequences to be globally aligned are:
ATCG (x=4, length of sequence 1)
TCG (y=3, length of sequence 2)
Pooja Anshul Saxena, University of Mississippi
![Page 53: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/53.jpg)
Scoring Scheme
• Match Score = +1
• Mismatch Score = -1
• Gap penalty = -1
• Substitution Matrix
A C G T
A 1 -1 -1 -1
C -1 1 -1 -1
G -1 -1 1 -1
T -1 -1 -1 1
Pooja Anshul Saxena, University of Mississippi
![Page 54: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/54.jpg)
Initialization Step
• Create a matrix with X +1 Rows and Y +1 Columns
• The 1st row and the 1st column of the score matrix are filled with 0s
T C G
0 0 0 0
A 0
T 0
C 0
G 0Pooja Anshul Saxena, University of Mississippi
![Page 55: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/55.jpg)
Scoring
• The score of any cell C(i, j) is the maximum of:scorediag = C(i-1, j-1) + S(I, j)scoreup = C(i-1, j) + gscoreleft = C(i, j-1) + g
And0(here S(I, j) is the substitution score for letters i and j, and g is the gap penalty)
Pooja Anshul Saxena, University of Mississippi
![Page 56: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/56.jpg)
Scoring ….• Example:
The calculation for the cell C(2, 2):
scorediag = C(i-1, j-1) + S(I, j) = 0 + -1 = -1
scoreup = C(i-1, j) + g = 0 + -1 = -1
scoreleft = C(i, j-1) + g = 0 + -1 = -1
T C G
0 0 0 0
A 0 0
T 0
C 0
G 0
Pooja Anshul Saxena, University of Mississippi
![Page 57: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/57.jpg)
Scoring ….
• Final Scoring Matrix
Note: It is not mandatory that the last cell has the maximum alignment score!
T C G
0 0 0 0
A 0 0 0 0
T 0 1 0 0
C 0 0 2 1
G 0 0 1 3
Pooja Anshul Saxena, University of Mississippi
![Page 58: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/58.jpg)
Trace back
• The trace back step determines the actual alignment(s) that result in the maximum score
• There are likely to be multiple maximal alignments
• Trace back starts from the cell with maximum value in the matrix
• Gives alignment in reverse order
Pooja Anshul Saxena, University of Mississippi
![Page 59: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/59.jpg)
Trace back ….
• There are three possible moves: diagonally (toward the top-left corner of the matrix), up, or left
• Trace back takes the current cell and looks to the neighbor cells that could be direct predecessors. This means it looks to the neighbor to the left (gap in sequence #2), the diagonal neighbor (match/mismatch), and the neighbor above it (gap in sequence #1). The algorithm for trace back chooses as the next cell in the sequence one of the possible predecessors. This continues till cell with value 0 is reached.
Pooja Anshul Saxena, University of Mississippi
![Page 60: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/60.jpg)
Trace back ….
• The only possible predecessor is the diagonal match/mismatch neighbor. If more than one possible predecessor exists, any can be chosen. This gives us a current alignment of
Seq 1: G |
Seq 2: G
T C G
0 0 0 0
A 0 0 0 0
T 0 1 0 0
C 0 0 2 1
G 0 0 1 3
Pooja Anshul Saxena, University of Mississippi
![Page 61: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/61.jpg)
Trace back ….• Final Trace back
Best Alignment:T C G | | |T C G
T C G
0 0 0 0
A 0 0 0 0
T 0 1 0 0
C 0 0 2 1
G 0 0 1 3
Pooja Anshul Saxena, University of Mississippi
![Page 62: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/62.jpg)
62
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htmLECTURE 3: GLOBAL ALIGNMENT
![Page 63: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/63.jpg)
63
http://www.personeel.unimaas.nl/Westra/Education/BioInf/slides_of_bioinformatics.htmLECTURE 3: GLOBAL ALIGNMENT
![Page 64: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/64.jpg)
Significance of Sequence Alignment
• Consider randomly generated sequences. What distribution do you think the best local alignment score of two sequences of sample length should follow?
1. Uniform distribution
2. Normal distribution
3. Binomial distribution (n Bernoulli trails)
4. Poisson distribution (n, np=)
5. others
![Page 65: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/65.jpg)
Extreme Value Distribution
• Yev = exp(- x - e-x )
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
![Page 66: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/66.jpg)
Extreme Value Distribution vs. Normal Distribution
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
![Page 67: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/67.jpg)
“Twilight Zone”
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Some proteins with less than 15% similarity have exactly the same 3-D structure while some proteins with 20% similarity have different structures. Homology/non-homology is never granted in the twilight zone.
![Page 68: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/68.jpg)
End of Theoretical Part 2
• Mindmapping
• 10 min break
![Page 69: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/69.jpg)
Needleman-Wunsch
• ACGTGTGCGTTTGAAC• GGGTGTAGTCGTTTAAAC
• Apply the Needleman-Wunsch algorithm to these two sequences
• Score the alignments
![Page 70: Applied Bioinformatics Week 3. Theory I Similarity Dot plot](https://reader034.vdocuments.net/reader034/viewer/2022051018/5697bf8a1a28abf838c8a512/html5/thumbnails/70.jpg)
Alignments
• Explanation for alignment algorithms– http://baba.sourceforge.net/
• Alignment of 2 sequences– http://www.expasy.org/tools/sim-prot.html
• Get any two amino acid sequences and try– http://bioinformatics.iyte.edu.tr/SmithWaterman/