aligning sequences with t-coffee

Post on 05-Jan-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Aligning Sequences With T-Coffee. Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program. T-Coffee and Concistency…. SeqA GARFIELD THE LAST FAT CAT. SeqB GARFIELD THE FAST CAT. SeqC GARFIELD THE VERY FAST CAT. SeqD THE FAT CAT. - PowerPoint PPT Presentation

TRANSCRIPT

Aligning SequencesWith

T-Coffee

Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT

SeqB GARFIELD THE FAST CAT

SeqC GARFIELD THE VERY FAST CAT

SeqD THE FAT CAT

SeqA GARFIELD THE LAST FA-T CATSeqB GARFIELD THE FAST CA-T ---SeqC GARFIELD THE VERY FAST CATSeqD -------- THE ---- FA-T CAT

Consistency: Conflicts and Information

Y

W Z

X

Z

Y

ZW

Y

Z

X

W

X

Y

OR

+

+Non

ConsistentConsistent

Y

W Z

Y

ZW

ORX

X

X

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100SeqD -------- THE ---- FAT CAT

SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100SeqC GARFIELD THE VERY FAST CAT

SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100SeqD -------- THE ---- FA-T CAT

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100SeqD -------- THE ---- FAT CAT

SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100SeqC GARFIELD THE VERY FAST CAT

SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100SeqD -------- THE ---- FA-T CAT

SeqA GARFIELD THE LAST FAT CAT Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CATSeqB GARFIELD THE ---- FAST CAT

SeqA GARFIELD THE LAST FA-T CAT Weight =100SeqD -------- THE ---- FA-T CATSeqB GARFIELD THE ---- FAST CAT

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CATSeqB GARFIELD THE ---- FAST CAT

SeqA GARFIELD THE LAST FA-T CAT Weight =100SeqD -------- THE ---- FA-T CATSeqB GARFIELD THE ---- FAST CAT

T-Coffee and Concistency…

T-Coffee and Concistency…

T-Coffee and Concistency…

Methods

Data

Scalability

Running T-Coffee over the Web

Available Servers and Flavors

Which MSA Method ???

Combining Many MSAs into ONE

MUSCLE

MAFFT

ClustalW

???????

T-Coffee

Consistency and Accuracy

What To Do Without Structures

Using the M-Coffee Server

Using the M-Coffee Server

Integrating New Types of DataTemplate Based Sequence

Alignments

ExperimentalData

TARGET

ExperimentalData

TARGETTemplate

Aligner

Template-Sequence Alignment

Primary Library

Template Alignment

Template based Alignmentof the Sequences

Templates Templates

TARGET

Exploring The Template World

Template Generator Alignment Method

RNA Structure Prediction RNA Aligner

Protein Structure BLAST vs PDB 3D Aligner

Profile BLAST vs NR Profile/Profile Alignment

Gene Structure ENSEMBL Genome Aligner

Promoter Transfac Meta-Aligner

Exploring The Template World

Template Generator Alignment Method

Mode

RNA Structure Prediction RNA Aligner R-Coffee

Protein Structure BLAST /PDB 3D Aligner 3D-Coffee

Profile BLAST/NR Profile/Profile PSI-Coffee

Gene Structure ENSEMBL Genome Aligner Exoset

Promoter Transfac Meta-Aligner Meta-Coffee

3D-Coffee/ExpressoIncorporating

Structural Information

Expresso: Finding the Right Structure

Sources

Templates

Library

BLAST BLAST

SAP

Template Alignment

Source Template Alignment

Remove Templates

Templates

PSI-CoffeeHomology Extension

Exploring The Template World

What is Homology Extension ?

L L

L

?

-Simple scoring schemes result in alignment ambiguities

What is Homology Extension ?

L L

L

LLLLLL

LLIVIL

LLLLLL

Profile 1

Profile 2

What is Homology Extension ?

L L

L

LLLLLL

LLIVIL

LLLLLL

Profile 1

Profile 2

PSI-Coffee: Homology Extension

Sources

Templates

Library

BLAST BLAST

Template Alignment

Source Template Alignment

Remove Templates

TemplatesProfile Aligner

Benchmarks

Do Benchmarks All Tell the same story?

Based on

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Consistency

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Homology Extension

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Structural Extension

T-Coffee and The World

BLAST/SOAP

-Some Templates are obtained with a BLAST-Queries can be sent to the EBI or the NCBI-No Need for a Local BLAST installation

Users sequences

top related