3d-coffee mixing sequences and structures cédric notredame

70
3D-COFFEE Mixing Sequences and Structures Cédric Notredame

Upload: george-adams

Post on 25-Dec-2015

233 views

Category:

Documents


3 download

TRANSCRIPT

3D-COFFEE Mixing Sequences and Structures

Cédric Notredame

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

chite AATAKQNYIRALQEYERNGG-wheat ANKLKGEYNKAIAAYNKGESAtrybr AEKDKERYKREM---------mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

Potential Uses of A Multiple Sequence Alignment?

Extrapolation

Motifs/Patterns

Phylogeny

Profiles

Struc. PredictionMultiple Alignments Are CENTRAL to MOST Bioinformatics Techniques.

Why Is It Difficult To Compute A multiple Sequence Alignment?

A CROSSROAD PROBLEM

BIOLOGY:What is A Good

Alignment

COMPUTATIONWhat is THE Good

Alignment

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

Why Is It Difficult To Compute A multiple

Sequence Alignment ?

BIOLOGY

CIRCULAR PROBLEM....

GoodSequences

GoodAlignment

COMPUTATION

The T-Coffee Algorithm

Local Alignment Global Alignment

Extension

Multiple Sequence Alignment

Mixing Local and Global Alignments

What is a library?

Extension+T-Coffee

Library Based Multiple Sequence Alignment

2Seq1 MySeqSeq2 MyotherSeq#1 21 1 253 8 70….

3Seq1 anotherseqSeq2 atsecondoneSeq3 athirdone#1 21 1 25#1 33 8 70….

The Triplet Assumption

X

Y

Z

X

Y

SEQ A

SEQ B

Consistency Consensus

ClustalW T-Coffee

Dynamic Programming Using An Extended Library

Progressive Alignment

What Is BaliBaseHow Good is T-Coffee ???

Best Performing Method on MSA benchmark Datasets

BaliBase

-Notredame

-Sonhammer

Ribosomal RNA

-Katoh (Mafft)

Homstrad

-Notredame

OxBench

-Barton

Mixing Heterogenous Data With

T-Coffee

Local Alignment Global Alignment

Multiple Sequence Alignment

Multiple Alignment

StructuralSpecialist

Mixing Sequences and Structures

Why Do We Want To Mix Sequences and Structures?

1-Predicting Sequence Structures

STUCTURE FUNCTION

Why Do We Want To Mix Sequences and Structures?

•Sequences are Cheap and Common.

•Structures are Expensive and Rare.

Why Do We Want To Mix Sequences and Structures?

Cheapest Structure determination:

Sequence-Structure Alignment

THREADOr

ALIGN

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

Why Do We Want To Mix Sequences and Structures?

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

THREADOr

ALIGN

Convincing Alignment

Same Fold

Why Do We Want To Mix Sequences and Structures?

Convincing Alignment

Same Fold

Distant sequences are hard to align

Why Do We Want To Mix Sequences and Structures?

chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKDwheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSEtrybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGPmouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: *

Multiple Sequence Alignments Help

Exploring the Twilight Zone

Why Do We Want To Mix Sequences and Structures?

1-Predicting Sequence Structures

2-Produce Better Alignments

Why Do We Want To Mix Sequences and Structures?

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLNALIGN

Unreliable alignment if %ID <30%

Why Do We Want To Mix Sequences and Structures?

Alignment Unsentitive to %ID

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

Struc.Superposition

Folds evolve Slower than Sequences

Why Do We Want To Mix Sequences and Structures?

Why Do We Want To Mix Sequences and Structures?

StructureSuperposition

Why Do We Want To Mix Sequences and Structures?

1-Predicting Sequence Structures

2-Produce Better Alignments

How To Mix Sequences and

Structures

Mixing Heterogenous Data With

T-Coffee

Local Alignment Global Alignment

Multiple Sequence Alignment

Multiple Alignment

StructuralSpecialist

Struct Vs StructSeq Vs Struct

Thread

Evaluation on Homestrad

Superpose

Seq Vs SeqLocalGlobal

Mixing Sequences and Structures with T-Coffee

The 3D-Coffee LibrariesMethods

•Global: Needlman and Wunsch

•Local: Sim (lalign)

•Threading: Fugue

•Superposition:SAP

•Threading: Fugue

Fugue

•Threading: Fugue

Fugue

•Threading: Fugue

1-Turn Sequence into a profile:-lower penalties in loops-Structure specific matrix

2-Align Profile

withSequence

Evaluating Fugue

•Threading: Fugue

1-Select 967 pairs of sequences in HOMSTRAD

FUGUE T-Coffee2-Align each pair with T-Coffee and Fugue.

Compare

3-Compare the TwoAlignments

Fugue

•Threading: Fugue

1-Select 967 pairs of sequences in HOMSTRAD

2-Align each pair with T-Coffee and Fugue.

3-Compare the TwoAlignments TCdef wins

Fugue wins TCdef: 58.81%Fugue: 61.81%

Superposition:

SAP

•Superposition:SAP

•Superposition:SAP

1-High Level Dynamic Programming

Substitution Matrix when doing regular Alignments

2-Low Level DP.Forcing the aln of two residues

1-High Level Dynamic Programming

•Superposition:SAP

1

9

12131

8

14

53-Rigid Body Superposition

RMSD

2-Low Level DP.Forcing the aln of two residues

1-High Level Dynamic Programming

•Superposition:SAP

1

9

1213

18

14

53-Rigid Body Superposition

RMSD

2-Low Level DP.Forcing the aln of two residues

1-High Level Dynamic Programming

•Superposition:SAP

3-Rigid Body Superposition

2-Low Level DP.Evaluate Every Pair

1-High Level Dynamic Programming

•Superposition:SAP

Structure Based Sequence Alignment

Make a DP on the

accumulated traces

Use Traces like a

Substitution Matrix

SAP T- Coff ee

Compare

1-Select 967 pairs of sequences in HOMSTRAD

2-Align each pair with T-Coffee and SAP.

3-Compare the TwoAlignments

•Superposition:SAP

1-Select 967 pairs of sequences in HOMSTRAD

2-Align each pair with T-Coffee and SAP.

3-Compare the TwoAlignments

•Superposition:SAP

TCdef: 58.81%SAP: 86.31%

•SAP•Fugue

TCdef: 58.81%Fugue: 61.81%

TCdef: 58.81%Fugue: 86.31%

Sequences and Structures:

How Good is The Mixture ???

Our Benchmark:

HOM39

-HOMSTRAD: Structure based MSAs that can be used as References.

-COMPACT and DEMANDING

-HOM39: The 39 Most difficult datasets (percent ID lower than 25).

Our BenchMark:

Using HOM39

BENCHMARKING Strategy:

-re-align HOM39 without using ALL the structures

-Compare the result with the reference

Evaluating 3D-Coffee

1- Can a SINGLE structure Help ?

Seq Vs Struct

Thread

Evaluation on HOM39

Seq Vs SeqLocalGlobal

Using ONE structure with3D-Coffee

HOM39 with ONE Structure per MSA

Evaluating 3D-Coffee

1- Can a SINGLE structure Help ?

2- Does it benefit to ALL the Sequences

Is EVERYONE Happier if there is a STAR in the team…

BaliBase

HOM39 TC-Fugue

+

Remove Provided Structure(s)

Comparison

Evaluating 3D-Coffee

1- Can a SINGLE structure Help ?

3- Can We Use Two or More Structures

2-Does it benefit to all the sequences

Seq Vs Struct

Fugue

Evaluation on Homestrad

Seq Vs SeqLocalGlobal

Mixing Sequences and Structures with 3D-Coffee

HOM39 with TWO Structures/MSA

Struct Vs Struct

SAP, LSQ

Indirect Improvement

Direct Improvement

Evaluating 3D-Coffee

1- Can a SINGLE structure Help ?

4-Relation Accuracy/ N-structures ???

2-Does it benefit to all the sequences

3-Can we use Two Structures

Seq Vs Struct

Fugue

Evaluation on Homestrad

Seq Vs SeqLocalGlobal

Mixing Sequences and Structures with T-Coffee

HOM39 with 1-N Structures per MSA

Struct Vs Struct

SAP

Induced Improvement

Conclusion

-Structures Help

BUT NOT SO MUCH

The More Structures The Merrier

The More Structures The Merrier

Credits

Orla O’Sullivan: University College, Cork, Ireland

Des Higgins: University College, Cork, Ireland

Karsten Suhre: IGS-CNRS, Marseille, France

Conclusion

The program is available on request from:

[email protected]