multiple sequence alignment

31
Multiple sequence alignment Tutorial 5

Upload: nerys

Post on 21-Jan-2016

55 views

Category:

Documents


2 download

DESCRIPTION

Tutorial 5. Multiple sequence alignment. A. C. D. B. Multiple Sequence Alignment – When?. More than two sequences DNA Protein Evolutionary relation Homology  Phylogenetic tree Detect motif. GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple sequence alignment

Multiple sequence alignment

Tutorial 5

Page 2: Multiple sequence alignment

• More than two sequences– DNA– Protein

• Evolutionary relation– Homology Phylogenetic tree– Detect motif

Multiple Sequence Alignment – When?

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 3: Multiple sequence alignment

• Dynamic Programming– Optimal alignment– Exponential in #Sequences

• Progressive– Efficient– Heuristic

Multiple Sequence Alignment – How?

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 4: Multiple sequence alignment

Hierarchical Clustering• A way to represent similarities graphically.• Sums up a pairwise distance matrix as a dendrogram.• Not all matrices can be embedded in a tree without error.

TGTTAACTGT-AACTGT--ACATGT--CATGTGGC

TGTTAACTGT-AACTGT--ACATGT--CATGTGGC

Page 5: Multiple sequence alignment

ClustalW

“CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

Page 6: Multiple sequence alignment

ClustalW

• Progressive (incremental)– At each step align two existing alignments or

sequences.– Gaps present in older alignments remain fixed.

• Uses the Neighbor Joining algorithm.

Page 7: Multiple sequence alignment

7

Neighbor Joining Algorithm

An agglomerative hierarchical clustering method.

Constructs unrooted tree.

Page 8: Multiple sequence alignment

Step by step summary:1. Calculate all pairwise distances.

2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).

3. Define a new node (x).

4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes.

5. Continue until two nodes remain – connect with edge.

Neighbor Joining (Not assuming equal divergence)

Page 9: Multiple sequence alignment

A

B

C

D

EA B C D E

A - 22 39 39 41

B - - 41 41 43

C - - - 18 20

D - - - - 10

E - - - - -

Step 1. Calculate all pairwise distances.

Page 10: Multiple sequence alignment

• Problem: unrelated sequences approach a fraction of difference expected by chance The distance measure converges.

• Jukes-Cantor

, Fraction of sites where residues differi jD f

Measuring Distance

,

3 4log(1 )

4 3i jD f

Page 11: Multiple sequence alignment

Measuring Distance (cont)• Euclidean Distance: Given a multiple sequence alignment,

calculate the square root of the sum of the score at every position between two sequences

• the score increases proportionally to the extent of dissimilarity between residues

2

,1

( , )n

a b i ii

d s a b

Page 12: Multiple sequence alignment

Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).

, , ( )i j i j i jM D r r

,

,

2

2

i ki

j kj

Dr

LD

rL

,i jM Relative distance between i and j

,i jD Distance between i and j from the distance table

ir Distance of i from all other sequences

L Number of leaves (=sequences) left in the tree

Page 13: Multiple sequence alignment

A B C D E

A - 22 39 39 41

B - - 41 41 43

C - - - 18 20

D - - - - 10

E - - - - -

22 39 39 41 14147

2 5 2 341 41 43 22

493

18 20 41 3939.3

310 39 41 18

363

41 43 20 1038

3

n ijA i j

B

C

D

E

Dr

L

r

r

r

r

Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).

Page 14: Multiple sequence alignment

A B C D E

A - -74 -47.3 -44 -44

B - - -47.3 -44 -44

C - - - -57.3 -57.3

D - - - - -64

E - - - - -

A,B is the pair with the minimal Mi,j distance.

The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances

, ( ) 22 (47 49) 74

39 (47 39.3) 47.3AB A B A B

AC

M D r r

M

Etc.

Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).

Page 15: Multiple sequence alignment

Step 3. Define a new node (x)

A

B

C

D

E

X

Page 16: Multiple sequence alignment

Step 4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes.

22 47 4910

2 222 49 47

122 2

AB A BAX

AB B ABX

D r rD

D r rD

X C D E

X - 29 29 31

C - - 18 20

D - - - 10

E - - - -

Now we’ll calculate the distance from X to all other nodes.

39 41 2229

2 239 41 22

292 2

41 43 2231

2 2

AC BC ABCX

AD BD ABDX

AE BE ABEX

D D DD

D D DD

D D DD

Page 17: Multiple sequence alignment

Step 5 - Continue until two nodes remain

New Mi,j tableX C D E

X - -49 -44 -44

C - - -44 -44

D - - - -49

E - - - -

A

B

C

D

E

XY

9, 20YC XYD D

Page 18: Multiple sequence alignment

New Di,j tableY D E

Y - 9 11

D - - 10

E - - -

Only 2 nodes are left. Let’s calculate all the distances to Z

A

B

C

D

E

XYZ

5, 6, 4ZY ZE ZDD D D

Page 19: Multiple sequence alignment

The tree

6

4

E

D

C

5

9

12

10

B

A

20

Z

Y

X

And in newick tree format

((C(D,E))(A,B))

Page 20: Multiple sequence alignment

ClustalW - Inputhttp://www.ebi.ac.uk/Tools/clustalw2/index.html

Input sequences

Gap scoring

Scoring matrix

Email address

Output format

Page 21: Multiple sequence alignment

ClustalW - Output

Match strength in decreasing order: * : .

Page 22: Multiple sequence alignment

ClustalW - Output

Page 23: Multiple sequence alignment

ClustalW - Output

Page 24: Multiple sequence alignment

ClustalW - Output

Page 25: Multiple sequence alignment

ClustalW - Output

Pairwise alignment scores

Building alignment

Final score

Building tree

Page 26: Multiple sequence alignment

ClustalW - Output

Page 27: Multiple sequence alignment

ClustalW Output

Sequence names Sequence positions

Match strength in decreasing order: * : .

Page 28: Multiple sequence alignment

ClustalW - Output

Page 29: Multiple sequence alignment

ClustalW - Output

Branch length

Page 30: Multiple sequence alignment

ClustalW - Output

Page 31: Multiple sequence alignment

ClustalW - Output