pairwise sequence alignment

74
Presented by Liu Qi Presented by Liu Qi Pairwise Sequence Pairwise Sequence Alignment Alignment

Upload: jereni

Post on 30-Jan-2016

95 views

Category:

Documents


1 download

DESCRIPTION

Pairwise Sequence Alignment. Why align sequences?. Functional predictions based on identifying homologues. Assumes: conservation of sequence conservation of function BUT: Function carried out at level of proteins, i.e. 3-D structure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pairwise Sequence  Alignment

Presented by Liu QiPresented by Liu Qi

Pairwise Sequence Pairwise Sequence AlignmentAlignment

Presented By Liu QiPresented By Liu Qi

Why align sequences

Functional predictions based on identifying homologues

Assumesconservation of sequence conservation of

function BUT Function carried out at level of proteins ie3-D structure Sequence conservation carried out at level of DNA1-D sequence

Presented By Liu QiPresented By Liu Qi

Some DefinitionsSome Definitions

An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer

An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

MethodsMethods

Dot matrix Dynamic Programming Word k-tuple (heuristic based)

Presented By Liu QiPresented By Liu Qi

Brief intro of methodsBrief intro of methods

dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences

bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length

bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

1 - one sequence listed along top of page and second sequence listed along the side

2 - move across row and put dot in any column where the character is the same

3 - continue for each row until all possible character matches between thesequences are represented by dots

4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 2: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Why align sequences

Functional predictions based on identifying homologues

Assumesconservation of sequence conservation of

function BUT Function carried out at level of proteins ie3-D structure Sequence conservation carried out at level of DNA1-D sequence

Presented By Liu QiPresented By Liu Qi

Some DefinitionsSome Definitions

An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer

An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

MethodsMethods

Dot matrix Dynamic Programming Word k-tuple (heuristic based)

Presented By Liu QiPresented By Liu Qi

Brief intro of methodsBrief intro of methods

dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences

bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length

bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

1 - one sequence listed along top of page and second sequence listed along the side

2 - move across row and put dot in any column where the character is the same

3 - continue for each row until all possible character matches between thesequences are represented by dots

4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 3: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Some DefinitionsSome Definitions

An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer

An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

MethodsMethods

Dot matrix Dynamic Programming Word k-tuple (heuristic based)

Presented By Liu QiPresented By Liu Qi

Brief intro of methodsBrief intro of methods

dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences

bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length

bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

1 - one sequence listed along top of page and second sequence listed along the side

2 - move across row and put dot in any column where the character is the same

3 - continue for each row until all possible character matches between thesequences are represented by dots

4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 4: Pairwise Sequence  Alignment

Some DefinitionsSome Definitions

An An alignment alignment is a mutual arrangement of is a mutual arrangement of two sequences which exhibits where the two sequences which exhibits where the two sequences are similar and where they two sequences are similar and where they differdiffer

An An optimal alignment optimal alignment is one that exhibits is one that exhibits the most correspondences and the least the most correspondences and the least differences It is the alignment with the differences It is the alignment with the highest score May or may not be highest score May or may not be biologically meaningfulbiologically meaningful

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

MethodsMethods

Dot matrix Dynamic Programming Word k-tuple (heuristic based)

Presented By Liu QiPresented By Liu Qi

Brief intro of methodsBrief intro of methods

dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences

bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length

bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

1 - one sequence listed along top of page and second sequence listed along the side

2 - move across row and put dot in any column where the character is the same

3 - continue for each row until all possible character matches between thesequences are represented by dots

4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 5: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

MethodsMethods

Dot matrix Dynamic Programming Word k-tuple (heuristic based)

Presented By Liu QiPresented By Liu Qi

Brief intro of methodsBrief intro of methods

dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences

bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length

bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

1 - one sequence listed along top of page and second sequence listed along the side

2 - move across row and put dot in any column where the character is the same

3 - continue for each row until all possible character matches between thesequences are represented by dots

4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 6: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Brief intro of methodsBrief intro of methods

dot matrix - all possible matches between sequence residues are foundused to compare two sequences to look for regions where they may align very useful for finding indels and repeats in sequences can be used as afirst pass to see if there is any similarity between sequences

bull dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences very computationallyexpensive - of steps increases exponentially with sequence length

bull k-tuple (word) methods - used by FASTA and BLAST (previously described) much faster than dynamic programming and ideal for databasesearches uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

1 - one sequence listed along top of page and second sequence listed along the side

2 - move across row and put dot in any column where the character is the same

3 - continue for each row until all possible character matches between thesequences are represented by dots

4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 7: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

1 - one sequence listed along top of page and second sequence listed along the side

2 - move across row and put dot in any column where the character is the same

3 - continue for each row until all possible character matches between thesequences are represented by dots

4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 8: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 9: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dot matrix with noise reductionDot matrix with noise reduction

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 10: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences

We compare a number of positions (window size) and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 11: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dot matrixDot matrix

Caution is necessary regarding the window size and the stringency value Generally they assume different values for different problems The optimal values will accent the regions of similarity of the two sequences

1048698 For DNA sequence usually1048698 Sliding window=15 stringency=10

1048698 For Protein sequence1048698 Sliding window=2 or 3 stringency=2

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 12: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Things to be consideredThings to be considered

Scoring matrix for distance correction

Window size Threshold

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 13: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

The useful of Dot plot The useful of Dot plot

Regions of similarity diagonalsRegions of similarity diagonals Insertionsdeletions gapsInsertionsdeletions gaps

Can determine intronexon structureCan determine intronexon structureRepeats parallel diagonalsRepeats parallel diagonals Inverted repeats perpendicular diagonalsInverted repeats perpendicular diagonals

Inverted repeatsInverted repeatsCan be used to determine regions of base Can be used to determine regions of base

pairing of RNA moleculespairing of RNA molecules

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 14: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Intra-sequence comparisonIntra-sequence comparison

RepeatsRepeatsInverted repeatsInverted repeatsLow complexityLow complexity

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 15: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

ABRACADABRACADABRACADABRACAD

ExamplesExamples

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 16: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

palindromepalindromeSequence ATOYOTA

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 17: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

RepeatsRepeats

Drosophila melanogaster SLIT protein against itself

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 18: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Low complexityLow complexity

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 19: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Inter sequence comparisonInter sequence comparison

Conserved domainsConserved domains Insertion and deletionInsertion and deletion

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 20: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Insertion and deletionInsertion and deletion

Seq1DOROTHYCROWFOOTHODGKINSeq1DOROTHYCROWFOOTHODGKINSeq2DOROTHYHODGKINSeq2DOROTHYHODGKIN

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 21: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Conserved domainsConserved domains

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 22: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Translated DNA and protein Translated DNA and protein comparison Exons and intronscomparison Exons and introns

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 23: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 24: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Even more can be done with RNAEven more can be done with RNA

RNA comparisons of the reverse complement of RNA comparisons of the reverse complement of a sequence to itself can often be very informativea sequence to itself can often be very informative

bull Consider the following set of examples from the Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from phenylalanine transfer RNA (tRNA-Phe) molecule from Bakerrsquos yeastBakerrsquos yeast

bull The sequence and structure of this molecule is also The sequence and structure of this molecule is also known the illustration will show how simple dot-matrix known the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural procedures can quickly lead to functional and structural insights (insights (even without complex folding algorithmseven without complex folding algorithms))

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 25: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Structures of Structures of tRNA-PhetRNA-Phe

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 26: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

RNA comparisons of the reverse RNA comparisons of the reverse complement of a sequence to itselfcomplement of a sequence to itself

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 27: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Programs for Dot MatrixPrograms for Dot Matrix

DotletDotlethttpwwwisrecisb-sibchjavadotletDotlethtml

SIGNALSIGNALhttpinnovationswmededuresearchinformaticshttpinnovationswmededuresearchinformatics

res_inf_sightmlres_inf_sightmlDotter Dotter

httpwwwcgbkisecgbgroupssonnhammerhttpwwwcgbkisecgbgroupssonnhammerDotterhtmlDotterhtml

COMPARE DOTPLOT in GCGCOMPARE DOTPLOT in GCG

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 28: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

conclusionconclusion

Advantages Advantages Readily reveals the presence of Readily reveals the presence of insertionsdeletions insertionsdeletions and direct and inverted and direct and inverted repeats repeats that are more difficult to that are more difficult to find by the other more automated methodsfind by the other more automated methods

letrsquos your eyesbrain do the work ndashVERY EFFICIENTletrsquos your eyesbrain do the work ndashVERY EFFICIENT

DisadvantagesDisadvantagesMost dot matrix computer programs Most dot matrix computer programs do not show an do not show an actual alignmenactual alignment Does not return a t Does not return a score score to indicate to indicate how lsquooptimalrsquo a given alignment ishow lsquooptimalrsquo a given alignment is

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 29: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

ReferenceReference

Gibbs A J amp McIntyre G A (1970) The diagram Gibbs A J amp McIntyre G A (1970) The diagram method for comparing sequences its The diagram method for comparing sequences its The diagram method for comparing sequences its use with amino method for comparing sequences its use with amino acid and nucleotide sequencesEur J Biochem 16 1-acid and nucleotide sequencesEur J Biochem 16 1-1111

Maizel JV Jr and Lenk RP (1981) nhanced graphic Maizel JV Jr and Lenk RP (1981) nhanced graphic matrix analysis of nucleic acid and protein sequences matrix analysis of nucleic acid and protein sequences Proc Natl Acad Sci 78 7665- 7669Proc Natl Acad Sci 78 7665- 7669

Staden R (1982) An interactive graphics program for Staden R (1982) An interactive graphics program for comparing and aligning nucleic-acid and amino-acid comparing and aligning nucleic-acid and amino-acid acid sequences Nucl Acid Res 10 (9) 2951-2961acid sequences Nucl Acid Res 10 (9) 2951-2961

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 30: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Answer what is the optimal alignment of two sequences(the best score)

How many different alignments

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 31: Pairwise Sequence  Alignment

Alignment methods with DPAlignment methods with DP

Global alignment Global alignment - Needleman-Wunsch - Needleman-Wunsch (1970) maximizes the number of matches (1970) maximizes the number of matches between the sequences along the entire between the sequences along the entire length of the sequenceslength of the sequences

Local alignment Local alignment - Smith-Waterman - Smith-Waterman (1981) is a modification of the dynamic (1981) is a modification of the dynamic programming algorithm giving the highest programming algorithm giving the highest scoring local match between two scoring local match between two sequencessequences

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 32: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

A simple exampleA simple example

3

4

5

3

6

5 4

2

A

B

C

D

E

F

8

7

9

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 33: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Exercise

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 34: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

动态规划的适用条件动态规划的适用条件

一个最优化策略的子策略总是最优的一个最优化策略的子策略总是最优的 无后向性无后向性

以前各阶段的状态无法直接影响它未来的决策以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性)空间换时间(子问题的重叠性)

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 35: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 36: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 37: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 38: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic ProgrammingDynamic Programming

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 39: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

DP Algorithm for Global DP Algorithm for Global AlignmentAlignment

Two sequences X = x1xn and Y = y1ym

F(i j) be the optimal alignment score of X1i and Y1j (0 le i le n 0 le j le m)

djiF

djiF

yxsjiF

jiF

F

ji

1

1

11

max

000

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 40: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

DP in equation formDP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 41: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 42: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 43: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5

GG -10-10

CC -15-15 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 44: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 -5-5 -10-10 -15-15

AA -5-5 22 -3-3 -8-8

GG -10-10 -3-3 -3-3 -1-1

CC -15-15 -8-8 -8-8 -6-6 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 45: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

TracebackTraceback

Start from the lower right corner and trace back Start from the lower right corner and trace back to the upper leftto the upper left

Each arrow introduces one character at the end Each arrow introduces one character at the end of each aligned sequenceof each aligned sequence

A A horizontalhorizontal move puts a gap in the move puts a gap in the leftleft sequencesequence

A A verticalvertical move puts a gap in the move puts a gap in the toptop sequence sequence A diagonal move uses one character from each A diagonal move uses one character from each

sequencesequence

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 46: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 47: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Start from the lower right Start from the lower right corner and trace back to corner and trace back to the upper leftthe upper left

Each arrow introduces one Each arrow introduces one character at the end of character at the end of each aligned sequenceeach aligned sequence

A horizontal move puts a A horizontal move puts a gap in the left sequencegap in the left sequence

A vertical move puts a gap A vertical move puts a gap in the top sequencein the top sequence

A diagonal move uses one A diagonal move uses one character from each character from each sequencesequence

A simple exampleA simple example

AA AA GG

00 -5-5

AA 22 -3-3

GG -1-1

CC -6-6

Find the optimal alignment of AAG and AGCUse a gap penalty of d=-5

AAG- AAG--AGC A-GC

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 48: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Find Global alignmentFind Global alignmentX=catgtX=catgtY=acgctgY=acgctgScore d=-1 mismatch=-1 match=2Score d=-1 mismatch=-1 match=2

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 49: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

AnswerAnswer

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 50: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

A single-domain protein may be homologous to A single-domain protein may be homologous to a region within a multi-domain proteina region within a multi-domain protein

Usually an alignment that spans the complete Usually an alignment that spans the complete length of both sequences is not requiredlength of both sequences is not required

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 51: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Local alignment DPLocal alignment DP

Align sequence x and yAlign sequence x and yF is the DP matrix s is the substitution F is the DP matrix s is the substitution

matrix d is the linear gap penaltymatrix d is the linear gap penalty

0

11

11

max

000

djiFdjiF

yxsjiF

jiF

F

ji

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 52: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Local DP in equation formLocal DP in equation form

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 53: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

Two differences with respect to global Two differences with respect to global alignmentalignmentNo score is negativeNo score is negativeTraceback begins at the highest score in the Traceback begins at the highest score in the

matrix and continues until you reach 0matrix and continues until you reach 0Global alignment algorithm Global alignment algorithm Needleman-Needleman-WunschWunsch

Local alignment algorithm Local alignment algorithm Smith-Smith-WatermanWaterman

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 54: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

AA

GG

CC 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 55: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00

GG 00

CC 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 56: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 57: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

A simple exampleA simple example

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

AA 00 22 22 00

GG 00 00 00 44

CC 00 00 00 00 11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and AGCUse a gap penalty of d=-5

0

AGAG

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 58: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00

AA 00

AA 00

GG 00

GG 00

CC 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 59: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Local alignmentLocal alignment

AA CC GG TT

AA 22 -7-7 -5-5 -7-7

CC -7-7 22 -7-7 -5-5

GG -5-5 -7-7 22 -7-7

TT -7-7 -5-5 -7-7 22

AA AA GG

00 00 00 00

GG 00 00 00 22

AA 00 22 22 00

AA 00 22 44 00

GG 00 00 00 66

GG 00 00 00 22

CC 00 00 00 00

11 jiF

jiF jiF 1

1 jiF

d

d ji yxs

Find the optimal local alignment of AAG and GAAGGCUse a gap penalty of d=-5

0

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 60: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment

any number of indel operations at the end or at the beginning of the alignment contribute zero weight

X= - - c a c - t g t a c

Y= g a c a c t t g - - -

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 61: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

End-Space Free AlignmentEnd-Space Free Alignment Base conditions foralli j F (i 0) = 0 F(0 j) = 0 Recurrence relation F (i j) = maxF(i -1 j - 1) + s(Xi Yj)

F(i -1 j) + dF (ij - 1) + d Search for i such that F (im) = max1leilen F (i m)

Search for j such that F(n j) =max1lejlem F (n j) Define alignment score F(n m) =max F(n j)F (im)

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 62: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Align two sequence Align two sequence (( match=1mismatch=-1gap=-1) match=1mismatch=-1gap=-1)

X = c a c t g t a c

Y= g a c a c t t g

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 63: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

思考题思考题Does a local alignment program always Does a local alignment program always

produce a local alignment and a global produce a local alignment and a global alignment program always produce a alignment program always produce a global alignmentglobal alignment

Develop an algorithm to find the longest Develop an algorithm to find the longest common subsequence (LCS) of two given common subsequence (LCS) of two given sequencessequences

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 64: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

LETVGYW----L

-5 -1 -1 -1Separate penalties for Separate penalties for gap openinggap opening and and

gap extensiongap extensionThis requires modifying the DP algorithmThis requires modifying the DP algorithm

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 65: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Affine gap penaltyAffine gap penalty

a gap of length k is more probable than k gaps of length 1 ndash a gap may be due to a single mutational event that inserteddeleted a

stretch of characters ndash separated gaps are probably due to distinct mutational events

a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two

terms ndash a penalty h associated with opening a gap ndash a smaller penalty g for extending the gap

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 66: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Gap penalty functionsGap penalty functions

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 67: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

need 3 matrices instead of 1

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 68: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Dynamic Programming for theDynamic Programming for theAffine Gap Penalty CaseAffine Gap Penalty Case

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 69: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 70: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 71: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

match=1 mismatch=-1

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 72: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 73: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

ExerciseExercise

Write the formula for ldquoLocal Alignment DP for the Affine Gap Penalty Caserdquo

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST

Page 74: Pairwise Sequence  Alignment

Presented By Liu QiPresented By Liu Qi

Word k-tup

FASTAFASTA

BLASTBLAST