overview of pairwise sequence alignment

28
1 Overview of Pairwise Seq uence Alignment Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent Needleman-Wunsch is a global alignment technique tha t uses an iterative algorithm and no gap penalty (co uld extend to fixed gap penalty). Smith-Waterman is a local alignment technique that u ses a recursive algorithm. Smith-Waterman’s algorit hm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local a nd global alignment. 報報報 報報報

Upload: hamish-hartman

Post on 03-Jan-2016

79 views

Category:

Documents


0 download

DESCRIPTION

Overview of Pairwise Sequence Alignment. 報告者:林哲鋒. Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview of Pairwise Sequence Alignment

1

Overview of Pairwise Sequence Alignment

• Dynamic Programming– Applied to optimization problems

– Useful when• Problem can be recursively divided into sub-problems• Sub-problems are not independent

• Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty).

• Smith-Waterman is a local alignment technique that uses a recursive algorithm. Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment.

報告者:林哲鋒

Page 2: Overview of Pairwise Sequence Alignment

2

「最長共同子序列」 (LCS, Longest Common Subsequence) 問題

• 首先我們先解釋什麼是子序列 (subsequence) ,所謂子序列就是將一個序列中的一些( 可能是零個 ) 字元去掉所得到的序列,例如: pred 、 sdn 、 predent 等都是 ” president” 的子序列。

• 給定兩序列,最長共同子序列 (LCS) 問題是決定一個子序列,使得 (1) 該子序列是這兩序列的子序列; (2) 它的長度是最長的。

Page 3: Overview of Pairwise Sequence Alignment

3

LCS

例如:

序列一: president

序列二: providence它的一個 LCS 為 priden ( PResIDENt PRovIDENce )

Page 4: Overview of Pairwise Sequence Alignment

4

LCS

又例如:

序列一: algorithm

序列二: alignment它的一個 LCS 為 algm or algt ( ALGorithM ALiGnMent )

Page 5: Overview of Pairwise Sequence Alignment

5

How to compute LCS?

• 給定兩序列及,令 len(i, j) 表示 LCS 之長度,則下列遞迴關係可用來計算 len(i, j) :

,

. and 0, if)),1(),1,(max(

and 0, if1)1,1(

,0or 0 if0

),(

ji

ji

bajijilenjilen

bajijilen

ji

jilen

Page 6: Overview of Pairwise Sequence Alignment

6

p r o c e d u r e L C S - L e n g t h ( A , B )

1 . f o r i ← 0 t o m d o l e n ( i , 0 ) = 0

2 . f o r j ← 1 t o n d o l e n ( 0 , j ) = 0

3 . f o r i ← 1 t o m d o

4 . f o r j ← 1 t o n d o

5 . i f ji ba

t h e n

" "),(

1)1,1(),(

jiprev

jilenjilen

6 . e l s e i f )1,(),1( jilenjilen

7 . t h e n

" "),(

),1(),(

jiprev

jilenjilen

8 . e l s e

" "),(

)1,(),(

jiprev

jilenjilen

9 . r e t u r n l e n a n d p r e v

insertion

deletion

Page 7: Overview of Pairwise Sequence Alignment

7

i j 0 1 p

2 r

3 o

4 v

5 i

6 d

7 e

8 n

9 c

10 e

0 0 0 0 0 0 0 0 0 0 0 0

1 p 2

0 1 1 1 1 1 1 1 1 1 1

2 r 0 1 2 2 2 2 2 2 2 2 2

3 e 0 1 2 2 2 2 2 3 3 3 3

4 s 0 1 2 2 2 2 2 3 3 3 3

5 i 0 1 2 2 2 3 3 3 3 3 3

6 d 0 1 2 2 2 3 4 4 4 4 4

7 e 0 1 2 2 2 3 4 5 5 5 5

8 n 0 1 2 2 2 3 4 5 6 6 6

9 t 0 1 2 2 2 3 4 5 6 6 6

圖: 以LCS-Length計算president與providence的LCS。

Page 8: Overview of Pairwise Sequence Alignment

8

p r o c e d u r e O u tp u t - L C S (A , p r e v , i , j )

1 i f i = 0 o r j = 0 t h e n r e t u r n

2 i f p r e v ( i , j ) = ” “ t h e n

ia

jiprevALCSOutput

print

)1,1,,(

3 e l s e i f p r e v ( i , j ) = ” “ t h e n O u tp u t - L C S (A , p r e v , i - 1 , j )

4 e l s e O u tp u t - L C S (A , p r e v , i , j - 1 )

Page 9: Overview of Pairwise Sequence Alignment

9

i j 0 1 p

2 r

3 o

4 v

5 i

6 d

7 e

8 n

9 c

10 e

0 0 0 0 0 0 0 0 0 0 0 0

1 p 2

0 1 1 1 1 1 1 1 1 1 1

2 r 0 1 2 2 2 2 2 2 2 2 2

3 e 0 1 2 2 2 2 2 3 3 3 3

4 s 0 1 2 2 2 2 2 3 3 3 3

5 i 0 1 2 2 2 3 3 3 3 3 3

6 d 0 1 2 2 2 3 4 4 4 4 4

7 e 0 1 2 2 2 3 4 5 5 5 5

8 n 0 1 2 2 2 3 4 5 6 6 6

9 t 0 1 2 2 2 3 4 5 6 6 6

圖: Output-LCS的回溯路線,深色陰影(priden)為LCS

所在。

Output : priden

Page 10: Overview of Pairwise Sequence Alignment

10

Identification of Common Molecular Subsequences

T. F. SMITE AND M. S. WATERM

J. Mol. Bwl. (1981), 147, 195-197

Page 11: Overview of Pairwise Sequence Alignment

11

ABSTRACT

• The identification of maximally homologous subsequences among sets of long sequences is an important problem.

• To find a pair of segments, one from each of two long sequences, such that there is no other pair of segments with greater similarity.

Page 12: Overview of Pairwise Sequence Alignment

12

Algorithm

• two molecular sequences will be A=a1a2 . . . an, and B=b1b2 . . . bm.

• A similarity s(a,b) is given between sequence elements a and b.

• Deletions of length k are given weight Wk

• Set up a matrix H. First set

Hko = Hol = 0 for 0 k n & 0 l m

Page 13: Overview of Pairwise Sequence Alignment

13

Algorithm cont.

• Hij is the maximum similarity of two segments ending in ai and bj

• These values are obtained from the relationship

Page 14: Overview of Pairwise Sequence Alignment

14

• (1) If ai and bj are associated, the similarity is

• (2) If ai is at the end of a deletion of length k, the similarity is

• (3) If bj is at the end of a deletion of length I , the similarity is

• (4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to a i and bj

Hij follows by considering the possibilities for ending ,the segments at any ai and bj.

Hi,j-l ─Wl

Page 15: Overview of Pairwise Sequence Alignment

15

• The pair of segments with maximum similarity is found by first locating the maximum element of H.

• The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero

Page 16: Overview of Pairwise Sequence Alignment

16

• in Figure 1.

• A match, ai = bj , s(ai,bj) =1 ,

a mismatch produced a minus one-third.

Page 17: Overview of Pairwise Sequence Alignment

17

Local VS global alignment

Page 18: Overview of Pairwise Sequence Alignment

18

Global Alignment vs. Local Alignment

• global alignment:

• local alignment:

Page 19: Overview of Pairwise Sequence Alignment

19

Global Alignment vs. Local Alignment

),(

),(),(

0

max

1,1

1,

,1

,

jiji

jji

iji

ji

baws

bwsaws

s

),(

),(

),(

max

1,1

1,

,1

,

jiji

jji

iji

ji

baws

bws

aws

s

local global

Page 20: Overview of Pairwise Sequence Alignment

20

0 0 0 0 0 0 0 0 0

0 8 5 2 0 0 8 5 2

0 5 3 0 0 8 5 3 13

0 2 0 0 0 8 5 2 11

0 0 0 0 8 5 3 13 10

0 0 0 0 8 5 2 11 8

0 8 5 2 5 3 13 10 7

0 5 3 0 2 13 10 8 18

C G G A T C A T

C

T

T

A

A

C

T

A – C - TA T C A T8-3+8-3+8 = 18

Local alignment exampleMatch: 8

Mismatch: -5

Gap symbol: -3

Page 21: Overview of Pairwise Sequence Alignment

21

global alignment

• Needleman Wunsch(1970)• Three steps in dynamic programming• Initialization • Matrix fill (scoring) • Traceback (alignment

• Match: +8 (w(x, y) = 8, if x = y)• Mismatch: -5 (w(x, y) = -5, if x ≠ y)• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)

Page 22: Overview of Pairwise Sequence Alignment

22

C T T A A C – TC G G A T C A T

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 5 -1 -4 9

-12 -1 -3 -5 6 3 0 7 6

-15 -4 -6 -8 3 1 -2 8 5

-18 -7 -9 -11 0 -2 9 6 3

-21 -10 -12 -14 -3 8 6 4 14

C G G A T C A T

C

T

T

A

A

C

T

8 – 5 –5 +8 -5 +8 -3 +8 = 14global alignment example1

Page 23: Overview of Pairwise Sequence Alignment

23

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 -3 8 19 16 13 10 7

-15 -11 -6 5 16 14 24 21 18

-18 -7 -9 2 13 11 21 32 29

-21 -10 1 -1 10 8 18 29 27

G A A T C T G C

C

A

A

T

T

G

A

-5 +8 +8 +8 -3 +8 +8 -5 = 27

C A A T - T G AG A A T C T G C global alignment example2

Page 24: Overview of Pairwise Sequence Alignment

24

Affine gap penalties• A gap of length k is penalized x + k·y.

gap-open penalty

gap-symbol penaltyThree cases for alignment endings:

1. ...x...x

2. ...x...-

3. ...-...x

an aligned pair

a deletion

an insertion

Page 25: Overview of Pairwise Sequence Alignment

25

Affine gap penalties• Let D(i, j) denote the maximum score of any alig

nment between a1a2…ai and b1b2…bj ending with a deletion.

• Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion.

• Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

Page 26: Overview of Pairwise Sequence Alignment

26

Affine gap penalties

),(

),(

),()1,1(

max),(

)1,(

)1,(max),(

),1(

),1(max),(

jiI

jiD

bawjiS

jiS

yxjiS

yjiIjiI

yxjiS

yjiDjiD

ji

(A gap of length k is penalized x + k·y.)

Page 27: Overview of Pairwise Sequence Alignment

27

Affine gap penalties

• Match: +8 (w(x, y) = 8, if x = y)• Mismatch: -5 (w(x, y) = -5, if x ≠ y)• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)• Each gap is charged an extra gap-open penalty: -4.

C - - - T T A A C TC G G A T C A - - T

+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12

-4 -4

Alignment score: 12 – 4 – 4 = 4

Page 28: Overview of Pairwise Sequence Alignment

28

END