pairwise sequence alignment part 2

33
Pairwise Sequence Alignment Part 2

Upload: darius

Post on 11-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Pairwise Sequence Alignment Part 2. Outline. Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments. Global Alignment -Cont. Needleman-Wunsch Alignment. Global alignment between sequences Compare entire sequence against another - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pairwise Sequence Alignment  Part 2

Pairwise Sequence Alignment Part 2

Page 2: Pairwise Sequence Alignment  Part 2

Outline

• Global alignments-continuation

• Local versus Global

• BLAST algorithms

• Evaluating significance of alignments

Page 3: Pairwise Sequence Alignment  Part 2

Global Alignment -Cont

Page 4: Pairwise Sequence Alignment  Part 2

Needleman-Wunsch Alignment• Global alignment between sequences

– Compare entire sequence against another• Create scoring table

– Sequence A across top, B down left• Cell at column i and row j contains the score

of best alignment between the first i elements of A and the first j elements of B– Global alignment score is bottom right cell

Page 5: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0

C 1 

A 2 

T 3 

G 4 

T 5 

Page 6: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1

C 1 

A 2 

T 3 

G 4 

T 5 

A-

Page 7: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1 

A 2 

T 3 

G 4 

T 5 

ACGCTG------

Page 8: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1

A 2  -2

T 3  -3

G 4  -4

T 5  -5

-----CATGT

Page 9: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1 -1

A 2  -2

T 3  -3

G 4  -4

T 5  -5

AC

Page 10: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1 -1 1

A 2  -2

T 3  -3

G 4  -4

T 5  -5

AC-C

Page 11: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1 -1 1 0

A 2  -2

T 3  -3

G 4  -4

T 5  -5

ACG-C-

Page 12: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1 -1 1 0 -1

A 2  -2

T 3  -3

G 4  -4

T 5  -5

ACGC-C--

ACGC---C

Page 13: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1 -1 1 0 -1 -2 -3

A 2  -2 1 0 0

T 3  -3

G 4  -4

T 5  -5

ACG-CA

Page 14: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1 -1 1 0 -1 -2 -3

A 2  -2 1 0 0 -1 -2 -3

T 3  -3 0 0 -1 -1 1 0

G 4  -4 -1 -1 2 1 0 3

T 5  -5 -2 -2 1 1 3 2

Page 15: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1 -2 -3 -4 -5 -6

C 1  -1 -1 1 0 -1 -2 -3

A 2  -2 1 0 0 -1 -2 -3

T 3  -3 0 0 -1 -1 1 0

G 4  -4 -1 -1 2 1 0 3

T 5  -5 -2 -2 1 1 3 2

Page 16: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1

C 1  -1 1 0

A 2  1 0 -1

T 3  0 1

G 4  2 1 3

T 5  3 2

Page 17: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1

C 1  -1 1 0

A 2  1 0 -1

T 3  0 1

G 4  2 1 3

T 5  3 2

ACGCTG--C-ATGT

Page 18: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1

C 1  -1 1 0

A 2  1 0 -1

T 3  0 1

G 4  2 1 3

T 5  3 2

ACGCTG--CA-TGT

Page 19: Pairwise Sequence Alignment  Part 2

0

A

1

C

2

G

3

C

4

T

5

G

6

0  0 -1

C 1  -1 1 0

A 2  1 0 -1

T 3  0 1

G 4  2 1 3

T 5  3 2

-ACGCTGCATG-T-

Page 20: Pairwise Sequence Alignment  Part 2

Global Alignment versus Local Alignment

ATTGCAGTG-TCGAGCGTCAGGCT

ATTGCGTCGATCGCAC-GCACGCT

Global Alignment

Local Alignment

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT

Page 21: Pairwise Sequence Alignment  Part 2

Global vs. Local alignment

DOROTHY

DOROTHY

HODGKIN

HODGKIN

Global alignment:DOROTHY--------HODGKINDOROTHYCROWFOOTHODGKIN

Local alignment:

Page 22: Pairwise Sequence Alignment  Part 2

Local Alignment

• Best score for aligning part of sequences– Often beats global alignment score

• Similar algorithm: Smith-Waterman– Table cells never score below zero

Page 23: Pairwise Sequence Alignment  Part 2

0

T

1

A

2

C

3

T

4

A

5

A

6

0  0 0 0 0 0 0 0

T 1  0 1 0 0 1 0 0

A 2  0 0 2 0 0 2 1

A 3  0 0 1 1 0 1 3

T 4  0 0 0 0 2 0 1

A 5  0 0 1 0 0 3 1

TACTA TAATA

TAATAA

Page 24: Pairwise Sequence Alignment  Part 2

Problems with DP for sequence alignments

-The complexity is very high

- Given a score, how to evaluate the significance of the alignment?

Page 25: Pairwise Sequence Alignment  Part 2

Complexity

• Complexity is determined by size of table– Aligning a sequence of length m against one of length n requires calculating (m n) cells

• Time of calculation Lets say we calculate 108 cells per second on a one

processor PC– Aligning two mRNA sequences of 8,000 bp requires

64,000,000 cells 0.64 seconds– Aligning an mRNA and a 107 bp chromosome requires

~1011 cells 1,000 secs = 15 minutes

Page 26: Pairwise Sequence Alignment  Part 2

Complexity for large databases

• Let’s say a database contains 3 1010 base pairs

– Searching an mRNA against the database will require ~2.5 1014 cells 2.5 106 secs = 1 month!

• We need an efficient algorithm to cut down on alignment

Page 27: Pairwise Sequence Alignment  Part 2

BLAST

• Basic Local Alignment Search Technique

• A set of tools developed at NCBI (BlastN, BlastP,..)

• BLAST benefits– Search speed– Ease of use– Statistical rigor

Page 28: Pairwise Sequence Alignment  Part 2

BLAST

• A good alignment contains subsequences of absolute identity:– First, identify very short (almost) exact matches.– Next, the best short hits from the 1st step are extended

to longer regions of similarity.– Finally, the best hits are optimized using the Smith-

Waterman algorithm.

Page 29: Pairwise Sequence Alignment  Part 2

Query sequenceWords of length W

(1)

(2) Compare the word list to the database and identify exact matches

BLAST Algorithm

W default = 11

Page 30: Pairwise Sequence Alignment  Part 2

(3) For each word match, extend alignment in both directions

(4) Score the alignments using Dynamic Programing

(5) Evaluate the statistics significance

Page 31: Pairwise Sequence Alignment  Part 2

• Using the pairwise comparison, each database search normally yields 2 groups of scores: genuinely related and unrelated sequences, with some overlap between them.

• A good search method should completely separate between the 2 score groups.

Database Searches

Random

Related

Page 32: Pairwise Sequence Alignment  Part 2

E-value• The number of hits (with the same similarity score) one can

"expect" to see just by chance when searching the given string in a database of a particular size.

• higher e-value lower similarity– “sequences with E-value of less than 0.01 are almost always

found to be homologous”

• The lower bound is normally 0 (we want to find the best)

Page 33: Pairwise Sequence Alignment  Part 2

Expectation Values

Increases linearly with

length of query sequence

Increases linearly with

length of database

Decreases exponentially with score of

alignment