sequence alignment (i)

41
Sequence Alignment (I) Kun-Mao Chao ( 趙趙趙 ) Department of Computer Scienc e and Information Engineering National Taiwan University, T aiwan E-mail: [email protected] WWW: http://www.csie.ntu.edu.tw/~k mchao

Upload: elden

Post on 25-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Sequence Alignment (I). Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan E-mail: [email protected] WWW: http://www.csie.ntu.edu.tw/~kmchao. Useful Websites. MIT Biology Hypertextbook - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sequence Alignment (I)

Sequence Alignment (I)

Kun-Mao Chao (趙坤茂 )Department of Computer Science an

d Information EngineeringNational Taiwan University, Taiwan

E-mail: [email protected]

WWW: http://www.csie.ntu.edu.tw/~kmchao

Page 2: Sequence Alignment (I)

2

Useful Websites• MIT Biology Hypertextbook

– http://www.mit.edu:8001/afs/athena/course/other/esgbio/www/7001main.html

• The International Society for Computational Biology:– http://www.iscb.org/

• National Center for Biotechnology Information (NCBI, NIH):– http://www.ncbi.nlm.nih.gov/

• European Bioinformatics Institute (EBI):– http://www.ebi.ac.uk/

• DNA Data Bank of Japan (DDBJ):– http://www.ddbj.nig.ac.jp/

Page 3: Sequence Alignment (I)

3

orz’s sequence evolutionorz (kid)OTZ (adult)Orz (big head)Crz (motorcycle driver)on_ (soldier)or2 (bottom up)oΩ (back high)STO (the other way around)Oroz (me)

the origin?

their evolutionary relationships?

their putative functional relationships?

Page 4: Sequence Alignment (I)

4

What?

THETR UTHIS MOREI

MPORT ANTTH ANTHE

FACTS

The truth is more important than the facts.

Page 5: Sequence Alignment (I)

5

Dot MatrixSequence A: CTTAACT

Sequence B: CGGATCATC G G A T C A T

C

T

T

A

A

C

T

Page 6: Sequence Alignment (I)

6

C---TTAACTCGGATCA--T

Pairwise AlignmentSequence A: CTTAACTSequence B: CGGATCAT

An alignment of A and B:

Sequence A

Sequence B

Page 7: Sequence Alignment (I)

7

C---TTAACTCGGATCA--T

Pairwise AlignmentSequence A: CTTAACTSequence B: CGGATCAT

An alignment of A and B:

Insertion gap

Match Mismatch

Deletion gap

Page 8: Sequence Alignment (I)

8

Alignment GraphSequence A: CTTAACT

Sequence B: CGGATCATC G G A T C A T

C

T

T

A

A

C

T

C---TTAACTCGGATCA--T

Page 9: Sequence Alignment (I)

9

A simple scoring scheme

• Match: +8 (w(x, y) = 8, if x = y)

• Mismatch: -5 (w(x, y) = -5, if x ≠ y)

• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)

C - - - T T A A C TC G G A T C A - - T

+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12

Alignment score

Page 10: Sequence Alignment (I)

10

An optimal alignment-- the alignment of maximum score

• Let A=a1a2…am and B=b1b2…bn .

• Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj

• With proper initializations, Si,j can be computedas follows.

),(

),(

),(

max

1,1

1,

,1

,

jiji

jji

iji

ji

baws

bws

aws

s

Page 11: Sequence Alignment (I)

11

Computing Si,j

i

j

w(ai,-)

w(-,bj)

w(ai,b

j)

Sm,n

Page 12: Sequence Alignment (I)

12

Initializations

0 -3 -6 -9 -12 -15 -18 -21 -24

-3

-6

-9

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Page 13: Sequence Alignment (I)

13

S3,5 = ?

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 ?

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Page 14: Sequence Alignment (I)

14

S3,5 = 5

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 5 -1 -1 9

-12 -1 -3 -5 6 3 0 7 6

-15 -4 -6 -8 3 1 -2 8 5

-18 -7 -9 -11 0 -2 9 6 3

-21 -10 -12 -14 -3 8 6 4 14

C G G A T C A T

C

T

T

A

A

C

T

optimal score

Page 15: Sequence Alignment (I)

15

C T T A A C – TC G G A T C A T

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 5 -1 -1 9

-12 -1 -3 -5 6 3 0 7 6

-15 -4 -6 -8 3 1 -2 8 5

-18 -7 -9 -11 0 -2 9 6 3

-21 -10 -12 -14 -3 8 6 4 14

C G G A T C A T

C

T

T

A

A

C

T

8 – 5 –5 +8 -5 +8 -3 +8 = 14

Page 16: Sequence Alignment (I)

16

Now try this example in class

Sequence A: CAATTGASequence B: GAATCTGC

Their optimal alignment?

Page 17: Sequence Alignment (I)

17

Initializations

0 -3 -6 -9 -12 -15 -18 -21 -24

-3

-6

-9

-12

-15

-18

-21

G A A T C T G C

C

A

A

T

T

G

A

Page 18: Sequence Alignment (I)

18

S4,2 = ?

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 ?

-15

-18

-21

G A A T C T G C

C

A

A

T

T

G

A

Page 19: Sequence Alignment (I)

19

S5,5 = ?

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 -3 8 19 16 13 10 7

-15 -11 -6 5 16 ?

-18

-21

G A A T C T G C

C

A

A

T

T

G

A

Page 20: Sequence Alignment (I)

20

S5,5 = 14

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 -3 8 19 16 13 10 7

-15 -11 -6 5 16 14 24 21 18

-18 -7 -9 2 13 11 21 32 29

-21 -10 1 -1 10 8 18 29 27

G A A T C T G C

C

A

A

T

T

G

A

optimal score

Page 21: Sequence Alignment (I)

21

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 -3 8 19 16 13 10 7

-15 -11 -6 5 16 14 24 21 18

-18 -7 -9 2 13 11 21 32 29

-21 -10 1 -1 10 8 18 29 27

G A A T C T G C

C

A

A

T

T

G

A

-5 +8 +8 +8 -3 +8 +8 -5 = 27

C A A T - T G AG A A T C T G C

Page 22: Sequence Alignment (I)

22

Global Alignment vs. Local Alignment

• global alignment:

• local alignment:

Page 23: Sequence Alignment (I)

23

An optimal local alignment

• Si,j: the score of an optimal local alignment ending at ai and bj

• With proper initializations, Si,j can be computedas follows.

),(

),(),(

0

max

1,1

1,

,1

,

jiji

jji

iji

ji

baws

bwsaws

s

Page 24: Sequence Alignment (I)

24

local alignment

0 0 0 0 0 0 0 0 0

0 8 5 2 0 0 8 5 2

0 5 3 0 0 8 5 3 13

0 2 0 0 0 8 5 2 11

0 0 0 0 8 5 3 ?

0

0

0

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

Page 25: Sequence Alignment (I)

25

local alignment

0 0 0 0 0 0 0 0 0

0 8 5 2 0 0 8 5 2

0 5 3 0 0 8 5 3 13

0 2 0 0 0 8 5 2 11

0 0 0 0 8 5 3 13 10

0 0 0 0 8 5 2 11 8

0 8 5 2 5 3 13 10 7

0 5 3 0 2 13 10 8 18

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

The best

score

Page 26: Sequence Alignment (I)

26

0 0 0 0 0 0 0 0 0

0 8 5 2 0 0 8 5 2

0 5 3 0 0 8 5 3 13

0 2 0 0 0 8 5 2 11

0 0 0 0 8 5 3 13 10

0 0 0 0 8 5 2 11 8

0 8 5 2 5 3 13 10 7

0 5 3 0 2 13 10 8 18

C G G A T C A T

C

T

T

A

A

C

T

The best

score

A – C - TA T C A T8-3+8-3+8 = 18

Page 27: Sequence Alignment (I)

27

Now try this example in class

Sequence A: CAATTGASequence B: GAATCTGC

Their optimal local alignment?

Page 28: Sequence Alignment (I)

28

Did you get it right?

0 0 0 0 0 0 0 0 0

0 0 0 0 0 8 5 2 8

0 0 8 8 5 5 3 0 5

0 0 8 16 13 10 7 4 2

0 0 5 13 24 21 18 15 12

0 0 2 10 21 19 29 26 23

0 8 5 7 18 16 26 37 34

0 5 16 13 15 13 23 34 32

G A A T C T G C

C

A

A

T

T

G

A

Page 29: Sequence Alignment (I)

29

0 0 0 0 0 0 0 0 0

0 0 0 0 0 8 5 2 8

0 0 8 8 5 5 3 0 5

0 0 8 16 13 10 7 4 1

0 0 5 13 24 21 18 15 12

0 0 2 10 21 19 29 26 23

0 8 5 7 18 16 26 37 34

0 5 16 13 15 13 23 34 32

G A A T C T G C

C

A

A

T

T

G

A

A A T – T GA A T C T G8+8+8-3+8+8 = 37

Page 30: Sequence Alignment (I)

30

Affine gap penalties• Match: +8 (w(x, y) = 8, if x = y)

• Mismatch: -5 (w(x, y) = -5, if x ≠ y)

• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)

• Each gap is charged an extra gap-open penalty: -4.

C - - - T T A A C TC G G A T C A - - T

+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12

-4 -4

Alignment score: 12 – 4 – 4 = 4

Page 31: Sequence Alignment (I)

31

Affine gap panalties• A gap of length k is penalized x + k·y.

gap-open penalty

gap-symbol penaltyThree cases for alignment endings:

1. ...x...x

2. ...x...-

3. ...-...x

an aligned pair

a deletion

an insertion

Page 32: Sequence Alignment (I)

32

Affine gap penalties

• Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion.

• Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion.

• Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

Page 33: Sequence Alignment (I)

33

Affine gap penalties

),(

),(

),()1,1(

max),(

)1,(

)1,(max),(

),1(

),1(max),(

jiI

jiD

bawjiS

jiS

yxjiS

yjiIjiI

yxjiS

yjiDjiD

ji

(A gap of length k is penalized x + k·y.)

Page 34: Sequence Alignment (I)

34

Affine gap penalties

SI

D

SI

D

SI

D

SI

D

-y-x-y

-x-y

-y

w(ai,bj)

Page 35: Sequence Alignment (I)

35

Constant gap penalties• Match: +8 (w(x, y) = 8, if x = y)

• Mismatch: -5 (w(x, y) = -5, if x ≠ y)

• Each gap symbol: 0 (w(-,x)=w(x,-)=0)

• Each gap is charged a constant penalty: -4.

C - - - T T A A C TC G G A T C A - - T

+8 0 0 0 +8 -5 +8 0 0 +8 = +27

-4 -4

Alignment score: 27 – 4 – 4 = 19

Page 36: Sequence Alignment (I)

36

Constant gap penalties

• Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion.

• Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion.

• Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

Page 37: Sequence Alignment (I)

37

Constant gap penalties

gap afor penalty gapconstant a is where

),(

),(

),()1,1(

max),(

)1,(

)1,(max),(

),1(

),1(max),(

x

jiI

jiD

bawjiS

jiS

xjiS

jiIjiI

xjiS

jiDjiD

ji

Page 38: Sequence Alignment (I)

38

Restricted affine gap panalties• A gap of length k is penalized x + f(k)·y.

where f(k) = k for k <= c and f(k) = c for k > c

Five cases for alignment endings:

1. ...x...x

2. ...x...-

3. ...-...x

4. and 5. for long gaps

an aligned pair

a deletion

an insertion

Page 39: Sequence Alignment (I)

39

Restricted affine gap penalties

),(');,(

),(');,(

),()1,1(

max),(

)1,(

)1,('max),('

)1,(

)1,(max),(

),1(

),1('max),('

),1(

),1(max),(

jiIjiI

jiDjiD

bawjiS

jiS

cyxjiS

jiIjiI

yxjiS

yjiIjiI

cyxjiS

jiDjiD

yxjiS

yjiDjiD

ji

Page 40: Sequence Alignment (I)

40

D(i, j) vs. D’(i, j)

• Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j)

• Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c

D(i, j) <= D’(i, j)

Page 41: Sequence Alignment (I)

41

Max{S(i,j)-x-ky, S(i,j)-x-cy}

kc

S(i,j)-x-cy