alineamiento matricial (harr plot, matrix plot, dot plot, dot matrix)

24
Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Upload: rodger-lane

Post on 29-Dec-2015

268 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot,

Dot Matrix)

Page 2: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Dot-matrix AlignmentDot-matrix Alignment

Mount Bioinformatics Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001)

Page 3: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Similarity ≠ Homology,

1) 25% similarity ≥ 100 AAs is likely homology

2) Homology is an evolutionary statement which means “descent from a common ancestor” –common 3D structure–usually common function–all or nothing, cannot say "50% homologous"

Page 4: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Similarity is Based on Dot Plots

1) two sequences on vertical and horizontal axes of graph

2) put dots wherever there is a match

3) diagonal line is region of identity (local alignment)

4) apply a window filter - look at a group of bases, must meet % identity to get a dot

Page 5: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Similitud de dos secuencias

ATGCTAGCACGCGTGCGCAAAGGCAAGCGCTCGTGCGTAA

%Identidad = 15/20 = 75%

Son similares?Depende del punto de corte para el % de identidad escogido

Page 6: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Similitud de dos secuencias

ATGCTAGCACGCGTGCGCAAAGGCAAGCGCTCGTGCGTAA identity = 75%

Tamaño de ventana (window size) = 20

Nivel de restricción (stringency)= ?

Si‘stringency’ <=15 (15/20=75%)=> son similares

Si‘stringency’ >15 (<=20) => No son similares

Page 7: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Dot-matrix AlignmentDot-matrix AlignmentGCTTAGGCTGAAGGCTGAACTA

G C T T A G G C T G A

A M M

G M M M M

G M M M

C M M

T M M M

G M M M

A M M

A M M

C M M

T M M M

A M M

Window = 1

Stringency = 1

Page 8: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Dot-matrix AlignmentDot-matrix AlignmentGCTTAGGCTGA

AGGCTGAACTA

G C T T A G G C T G A

A M M

G M M M M

G M M M

C M M

T M M M

G M M M

A M M

A M M

C M M

T M M M

A M M

Window = 1

Stringency = 1

Page 9: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Dot-matrix AlignmentDot-matrix AlignmentGCTTAGGCTGAAGGCTGAACTA

G C T T A G G C T G A

A M M

G M M M M

G M M M

C M M

T M M M

G M M M

A M M

A M M

C M M

T M M M

A M M

Window = 1

Stringency = 1

Page 10: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

S E Q U E N C E A N A L Y S I S P R I M E R

S

E

Q

U

E

N

C

E

A

N

A

L

Y

S

I

S

P

R

I

M

E

R

Since this is a comparison between two of the same sequences, an intrasequence comparison, the most obvious feature is the main identity diagonal. Two short perfect palindromes can also be seen as crosses directly off the main diagonal; they are “ANA” and “SIS.”

Window = 1

Stringency = 1

Page 11: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

S E Q U E N C E A N A L Y S I S P R I M E R

S

E

Q

U

E

N

C

E

A

N

A

L

Y

S

I

S

P

R

I

M

E

R

Since this is a comparison between two of the same sequences, an intrasequence comparison, the most obvious feature is the main identity diagonal. Two short perfect palindromes can also be seen as crosses directly off the main diagonal; they are “ANA” and “SIS.”

Window = 1

Stringency = 1

Page 12: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Dot-matrix Alignment Dot-matrix Alignment

G C T T A G G C T G A

A

G

G

C

T

G

A

A

C

T

A

Window = 3

Stringency = 3

Page 13: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

The only remaining dots indicate the two runs of identity between the two sequences; however, any indication of the palindrome, “ANA” has been lost. This is because our filtering approach was too

stringent to catch such a short element. In general you need to make your window about the same size as the element you are attempting to locate. In the case of our palindrome, “AN” and “NA”’ are the inverted repeat sequences and since our window was set to three, we will not be able to see an element only two letters long. Had we set our stringency filter to one in a window of two, then these

would be visible. The Wisconsin Package’s implementation of dot matrix analysis, the paired programs Compare and DotPlot use the window/stringency method by default.

Page 14: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Dot plot of real data

CVJB

Window Size = 8 Scoring Matrix: pam250 matrixMin. % Score = 30Hash Value = 2

20 40 60 80 100 120 140 160 180 200 220

20

40

60

80

100

120

140

160

180

200

220

Page 15: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

S E Q U E N C E A N A L Y S I S P R I M E R

S

E

Q

U

E

N

C

E

S

E

Q

U

E

N

C

E

S

E

Q

U

E

N

C

E

Another phenomenon that is very easy to visualize with dot matrix analysis are duplications or direct repeats.

The ‘duplication’ here is seen as a distinct column of diagonals; whenever you see either a row or column of diagonals in a dotplot, you are looking at direct repeats.

Window = 1

Stringency = 1

Page 16: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Now consider the more complicated ‘mutation’ in the following comparison:

S E Q U E N C E A N A L Y S I S P R I M E R

A

N

A

L

Y

Z

E

S

E

Q

U

E

N

C

E

S

Again, notice the diagonals. However, they have now been displaced off of the center diagonal of the plot and, in fact, in this example, show the occurrence of a ‘transposition.’ Dot matrix analysis is one of the only sensible ways to locate such transpositions in sequences. Inverted repeats still show up as perpendicular lines to the diagonals, they are just now not on the center of the plot. The ‘deletion’ of ‘PRIMER’ is shown by the lack of a corresponding diagonal.

Window = 1

Stringency = 1

Page 17: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Reconsider the same plot. Notice the extraneous dots that neither

indicate runs of identity between the two sequences nor inverted

repeats. These merely contribute ‘noise’ to the plot and are due to

the ‘random’ occurrence of the letters in the sequences, the

composition of the sequences themselves.

How can we ‘clean up’ the plots so that this noise does not detract

from our interpretations? Consider the implementation of a filtered

windowing approach; a dot will only be placed if some ‘stringency’ is

met.

What is meant by this is that if within some defined window size, and

when some defined criteria is met, then and only then, will a dot be

placed at the middle of that window. Then the window is shifted one

position and the entire process is repeated. This very successfully

rids the plot of unwanted noise.

In the next plot a window of size three and a stringency of two was

used to considerably improve the signal to noise ratio (remember, I am

using a 1:0 identity scoring function).

Filtered Windowing —

Page 18: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Default RNA self comparison (Phe tRNA)(window of 21 and stringency of 14) —

Page 19: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

window size to 7

stringency value to 5

Several direct repeats are now obvious that remained obscured in the previous analysis.

Page 20: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

22 GAGCGCCAGACT G 12, 22 || | ||||| | A

48 CTGGAGGTCTAG A 3

Base position 22 through position 33 base pairs with (think — is quite similar to the reverse-

complement of) itself from base position 37 through position 48. MFold, Zuker’s RNA folding algorithm uses base pairing energies to find the family of optimal and suboptimal structures; the most stable structure found is shown to possess a stem at positions 27 to 31 with 39 to 43. However the region around position 38 is represented as a loop. The actual modeled structure as seen in PDB’s 1TRA shows ‘reality’ lies somewhere in between.

Page 21: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

RNA comparisons of the reverse, complement of a sequence to itself can often be very informative. Here the yeast tRNA sequence is compared to its reverse, complement using the same 5 out of 7

stringency setting as previously. The stem-loop, inverted repeats of the tRNA clover-leaf molecular shape become obvious.

Page 22: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

That same region ‘zoomed in on’ has some small direct repeats seen by comparing the sequence against itself without reversal:

Page 23: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

But looking at the same region of the sequence against its reverse-complement shows a wealth of potential stem-loop structure in the transfer RNA:

Page 24: Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Conclusion: Dot-matrix AlignmentConclusion: Dot-matrix Alignment

• Strengths:

Simple All possible matches generated Can identify repeated sequence elements Often provides a starting point for other alignment algorithms

• Weaknesses:

Noise level can be high Cannot discriminate optimal from suboptimal alignments Doesn’t handle gaps well