multiple sequence alignment

28

Upload: gada

Post on 05-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Multiple sequence alignment. Conserved blocks are recognized. Different degrees of similarity are marked. Multiple Sequence Alignment. VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple sequence alignment
Page 2: Multiple sequence alignment

Multiple sequence alignment

Conserved blocks are recognized

Different degrees of similarity are marked

Page 3: Multiple sequence alignment

Multiple Sequence Alignment

VTISCTGSSSNIGAG-NHVKWYQQLPG

VTISCTGTSSNIGS--ITVNWYQQLPG

LRLSCSSSGFIFSS--YAMYWVRQAPG

LSLTCTVSGTSFDD--YYSTWVRQPPG

PEVTCVVVDVSHEDPQVKFNWYVDG--

ATLVCLISDFYPGA--VTVAWKADS--

AALGCLVKDYFPEP--VTVSWNSG---

VSLTCLVKGFYPSD--IAVEWESNG--

The purpose of multiple sequence alignments is to place homologous positions of homologous sequences into the same column.

Page 4: Multiple sequence alignment

ClustalW

• Based on phylogenetic analysis• A phylogenetic tree is created using a pairwise

distance matrix and nearest-neighbor algorithm• The most closely-related pairs of sequences are

aligned using dynamic programming• Each of the alignments is analyzed and a profile

of it is created• Alignment profiles are aligned progressively for

a total alignment

Page 5: Multiple sequence alignment

Progressive multiple alignment

• Perform pairwise alignments for all sequencesAssume a match gives a score of 1, a mismatch is -0.25, indel is -0.5

1 -.25 1 1 1 1

Total Score: 4.75

Page 6: Multiple sequence alignment

Progressive multiple alignment

• Create guide tree from pairwise alignments

• Use tree to build multiple sequence alignment

• Align most similar sequences first (give the most reliable alignments)

• Align the profile to the next closest sequence

• Align profiles to each other Multiple sequence

alignment will be at the root of the tree

Page 7: Multiple sequence alignment

Progressive multiple alignment

Page 8: Multiple sequence alignment

Web ClustalW2 options:

Operational options

Output options

Output options, matrix choice, gap opening penalty

Gap penalties, output tree type

File input in GCG, FASTA, EMBL, GenBank, Phylip, or several other formats

Page 9: Multiple sequence alignment

Choose to run clustalw interactively or wait for results by email.

Interactive may take some time so be patient

Page 10: Multiple sequence alignment

Give your alignment a title.

Page 11: Multiple sequence alignment

You can choose between a fast or full alignment. Full is more accurate

and is what we will be using.

Page 12: Multiple sequence alignment
Page 13: Multiple sequence alignment

We will use this option

And this one

Page 14: Multiple sequence alignment
Page 15: Multiple sequence alignment
Page 16: Multiple sequence alignment
Page 17: Multiple sequence alignment
Page 18: Multiple sequence alignment

Alignment - considerations

• The programs simply try to maximize the number of matches– The “best” alignment may not be the

correct biological one• Multiple alignments are done progressively

– Such alignments get progressively worse as you add sequences

– Mistakes that occur during alignment process are frozen in.

• You will sometimes have to correct manually

Page 19: Multiple sequence alignment

Problem What to do

Many sequences Start with 10-15 sequences and avoid aligning more than 50sequences.

Very different sequences Sequences that are less than 30% identical with more thanhalf of the other sequences in the set often cause troubles.

Identical sequences They never help. Unless you have a very good reason to doso, avoid incorporating in your MSA any sequence that ismore than 90%identical to another sequence in the set.

Partial sequences MSA programs prefer sequences that are roughly the samelength. Programs often have difficulties comparing a mixtureof complete sequences and shorter fragments.

Repeated domains Sequences with repeated domains cause troubles to mostMSA programs, especially if the number of domains isdifferent.

Page 20: Multiple sequence alignment

Need more accuracy then Clustalw for low identity sequences?

Page 21: Multiple sequence alignment
Page 22: Multiple sequence alignment
Page 23: Multiple sequence alignment

PSI-BLAST

Page 24: Multiple sequence alignment

Position Specific Iterated BLAST: PSI-BLAST

The purpose of PSI-BLAST is to look deeperinto the database for matches to your queryprotein sequence by employing a scoringmatrix that is customized to your query.

Page 25: Multiple sequence alignment

PSI-BLAST is performed in five steps

[1] Select a query and search it against a protein database – REGULAR BLAST

[2] PSI-BLAST constructs a multiple sequence alignmentthen creates a “profile” or specialized position-specificscoring matrix (PSSM) – user-assisted – you can help choosing the candidates.

[3] The PSSM is used as a query against the database

[4] PSI-BLAST estimates statistical significance (E values)

[5] Repeat steps [3] and [4] iteratively, typically 5 times.At each new search, a new profile is used as the query.

Page 26: Multiple sequence alignment

PPSSSSMM

Page 27: Multiple sequence alignment

PSI-BLAST: self-positives

PSI-BLAST is useful to detect weak but biologically meaningful relationships between proteins.

The main source of false positives is the erroneous amplification of sequences not related to the query. For instance, a query with a coiled-coil motif may detect thousands of other proteins with this motifthat are not homologous.

Once even a single non-related protein is included in a PSI-BLAST search above threshold, it will not go away.

Page 28: Multiple sequence alignment

One way to check results: take newly found seqs and perform PSI-BLAST using them, then examine whether we ‘fish’ original seq (reciprocal identification)