evolution and structure of biomolecules - auusers-birc.au.dk/jdutheil/teaching/structure.pdf ·...

44
Evolution and Structure of Biomolecules Julien Dutheil 1 <[email protected]> 1 BiRC – Bioinformatics Research Center, University of ˚ Arhus http://birc.au.dk/ ~ jdutheil/Teaching/ February 2008 Julien Dutheil (BiRC – University of ˚ Arhus) Evolution and Structure of Biomolecules February 2008 1 / 23

Upload: tranliem

Post on 28-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

Evolution and Structure of Biomolecules

Julien Dutheil1

<[email protected]>

1BiRC – Bioinformatics Research Center,University of Arhus

http://birc.au.dk/~jdutheil/Teaching/

February 2008

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 1 / 23

Page 2: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

Introduction

Which molecules are we interested in?

Evolving molecules: DNA, RNA, Proteins

Several structures are resolved: we have the set of coordinates foralmost all atom positions

There are several levels of structure organization

DNA RNA Proteins

I desoxyribonucleotidessequence

ribonucleotidessequence

amino-acids se-quence

II double-helix loop and stem (dou-ble stranded regions)

loop, helices, turns,strands and sheets

III — yes, for rRNA andtRNA

domain organization

IV — rRNA a lot of protein com-plexes

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 2 / 23

Page 3: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

Introduction

Which molecules are we interested in?

Evolving molecules: DNA, RNA, Proteins

Several structures are resolved: we have the set of coordinates foralmost all atom positions

There are several levels of structure organization

DNA RNA Proteins

I desoxyribonucleotidessequence

ribonucleotidessequence

amino-acids se-quence

II double-helix loop and stem (dou-ble stranded regions)

loop, helices, turns,strands and sheets

III — yes, for rRNA andtRNA

domain organization

IV — rRNA a lot of protein com-plexes

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 2 / 23

Page 4: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

Why should we study their structure?

The function of a molecule is tightly linked to its structure

Improve models of sequence evolution, for better inference ofevolutionary processes (including phylogeny)

Use evolutionary information to study/predict the structure ofmolecules

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 3 / 23

Page 5: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

Outline of the lecture

1 On the non-homogeneity of the substitution processRate across site variationRate across site co-variationAccounting for secondary structure

2 On the non-independence of substitution eventsThe success story of RNA structure predictionAccounting for phylogeny: substitution mappingDetecting coevolution in proteinsNon-independence and models of evolution

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 4 / 23

Page 6: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Rate across site variation

All positions are not equal

The structure of a molecule determines a set of constraints acting onindividual sites

The neutral theory states that the level of constraint determines therate of substitution

⇒ All positions do not evolve at the same rate

Estimating site-specific rates

Site variability (entropy, information): do not account for phylogeny!May be very inaccurate...

Parsimony score

Empirical Bayesian estimation (work of Ziheng Yang)

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 5 / 23

Page 7: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Rate across site variation

All positions are not equal

Example (A vertebrate myoglobin data set)

G L S D G E W Q L V L N A W G K V E A D V A G H G Q E V L I R L F T G H P E T L

E K F D K F K H L K T E A E M K A S E D L K K H G N T V L T A L G G I L K K K G

H H E A E V K H L A E S H A N K H K I P V K Y L E F I S D A I I H V L H A K H P

S D F G A D A Q A A M S K A L E L F R N D M A A Q Y K V L G F H G

Rate

Helices

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 6 / 23

Page 8: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Rate across site variation

All positions are not equal

Purple (lent) −→ Cyan (fast)

Made with ConSurf http://consurf.tau.ac.il/ [Glaser et al., 2003]

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 7 / 23

Page 9: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Rate across site variation

All positions are not equal

Purple (lent) −→ Cyan (fast)

Made with ConSurf http://consurf.tau.ac.il/ [Glaser et al., 2003]

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 7 / 23

Page 10: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Rate across site co-variation

Do constraints act on individual sites?

Several motifs cover whole regions of the sequence (helices, strands)

We then expect the rates of substitutions to be somehow correlatedalong the sequence

Model which allow consecutive sites to be correlated: Hidden MarkovModels

Hidden Markov Models: Felsenstein and Churchill [1996]’s model

Several rate classes like in Yang’s model

Model of substitution between rates

Developed for DNA sequences, application to hemoglobin: helicesevolve more rapidly than coils, but large variations

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 8 / 23

Page 11: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Rate across site co-variation

Hidden Markov Models: Felsenstein and Churchill [1996]

The rate of a specific site is unknown, be we can compute thelikelihood of a given site according to a given rate

Each rate is a (hidden) state, and we have a (Markov) model oftransitions between states along the alignment

We can compute the likelihood of the whole data set according to thetransition model, estimate transition parameters, estimate the mostlikely hidden state for each site, etc

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 9 / 23

Page 12: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Rate across site co-variation

Do constraints act on individual sites?

Several motifs cover whole regions of the sequence (helices, strands)

We then expect the rates of substitutions to be somehow correlatedalong the sequence

Model which allow consecutive sites to be correlated: Hidden MarkovModels

Hidden Markov Models: Felsenstein and Churchill [1996]’s model

Several rate classes like in Yang’s model

Model of substitution between rates

Developed for DNA sequences, application to hemoglobin: helicesevolve more rapidly than coils, but large variations

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 10 / 23

Page 13: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Accounting for secondary structure

Distinct structural constraints, distinct substitutionprocessesGoldman, Thorne and Jones’s model [Thorne et al., 1996, Goldman et al., 1996, 1998]

Four types of secondary structure: Helix, Sheet, Turn and Coil

Position dependent states (1,2,3,...,n − 2,n − 1,n)

Two solvent accessibility classes: Exposed or Buried ⇒ 38 6= states

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

A secondary structure specific replacement matrix is estimated from aset of protein data set with known structure

This model provides a significantly better fit than independent,homogeneous models

These matrices can be used with a protein with unknown structure,and allow to predict the most likely motif for each residue

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 11 / 23

Page 14: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Accounting for secondary structure

Distinct structural constraints, distinct substitutionprocessesGoldman, Thorne and Jones’s model [Thorne et al., 1996, Goldman et al., 1996, 1998]

Four types of secondary structure: Helix, Sheet, Turn and Coil

Position dependent states (1,2,3,...,n − 2,n − 1,n)

Two solvent accessibility classes: Exposed or Buried ⇒ 38 6= states

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

A secondary structure specific replacement matrix is estimated from aset of protein data set with known structure

This model provides a significantly better fit than independent,homogeneous models

These matrices can be used with a protein with unknown structure,and allow to predict the most likely motif for each residue

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 11 / 23

Page 15: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Accounting for secondary structure

Distinct structural constraints, distinct substitutionprocessesGoldman, Thorne and Jones’s model [Thorne et al., 1996, Goldman et al., 1996, 1998]

Four types of secondary structure: Helix, Sheet, Turn and Coil

Position dependent states (1,2,3,...,n − 2,n − 1,n)

Two solvent accessibility classes: Exposed or Buried ⇒ 38 6= states

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

A secondary structure specific replacement matrix is estimated from aset of protein data set with known structure

This model provides a significantly better fit than independent,homogeneous models

These matrices can be used with a protein with unknown structure,and allow to predict the most likely motif for each residue

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 11 / 23

Page 16: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-homogeneity of the substitution process Accounting for secondary structure

Example of posterior decoding:

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 12 / 23

Page 17: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events The success story of RNA structure prediction

RNA structure

Two motifs in RNA: stems andloops

Constraints on Watson-Crickpairs within stems

⇒ Find the sequence foldingthat maximizes the number ofW-C pairs ( 80s)

Improvements:

Use a sequence alignment ( 90s)

Take into account theunderlying phylogeny ( 90s +)

Use complex models thataccount for spatial dependency(2000)

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 13 / 23

Page 18: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

Using phylogenetics to detect coevolving pairsMapping the substitution events

Goal:

Locate the substitution events on a phylogenetic tree for each position of asequence alignment

Reconstruct all ancestral states for eachnode

For each branch and for each site,count 1 substitution if the states aredifferent, 0 otherwise

Improvements: take into accountuncertainty on ancestral states andbranch lengths

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 14 / 23

Page 19: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

Using phylogenetics to detect coevolving pairsMapping the substitution events

Goal:

Locate the substitution events on a phylogenetic tree for each position of asequence alignment

Reconstruct all ancestral states for eachnode

For each branch and for each site,count 1 substitution if the states aredifferent, 0 otherwise

Improvements: take into accountuncertainty on ancestral states andbranch lengths

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 14 / 23

Page 20: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

Using phylogenetics to detect coevolving pairsMapping the substitution events

Goal:

Locate the substitution events on a phylogenetic tree for each position of asequence alignment

Reconstruct all ancestral states for eachnode

For each branch and for each site,count 1 substitution if the states aredifferent, 0 otherwise

Improvements: take into accountuncertainty on ancestral states andbranch lengths

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 14 / 23

Page 21: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

Using phylogenetics to detect coevolving pairsMapping the substitution events

Goal:

Locate the substitution events on a phylogenetic tree for each position of asequence alignment

Reconstruct all ancestral states for eachnode

For each branch and for each site,count 1 substitution if the states aredifferent, 0 otherwise

Improvements: take into accountuncertainty on ancestral states andbranch lengths

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 14 / 23

Page 22: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

Detecting non-independent positions

1 Define a measure of coevolutionfor a group of sites, based on theunderlying substitution mapping

2 Repeat 1000 times:1 Simulate a group of sites

under the hypothesis ofindependence

2 Record the coevolutionmeasure of the group obtained

3 Compare the value of thecoevolution for real groups andthe ones obtained fromsimulations

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 15 / 23

Page 23: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

Detecting non-independent positions

1 Define a measure of coevolutionfor a group of sites, based on theunderlying substitution mapping

2 Repeat 1000 times:1 Simulate a group of sites

under the hypothesis ofindependence

2 Record the coevolutionmeasure of the group obtained

3 Compare the value of thecoevolution for real groups andthe ones obtained fromsimulations

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 15 / 23

Page 24: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

Detecting non-independent positions

1 Define a measure of coevolutionfor a group of sites, based on theunderlying substitution mapping

2 Repeat 1000 times:1 Simulate a group of sites

under the hypothesis ofindependence

2 Record the coevolutionmeasure of the group obtained

3 Compare the value of thecoevolution for real groups andthe ones obtained fromsimulations

Null distribution of the statistic

Fre

quen

cy

0 2 4 6 8 10

020

4060

8010

012

0

Observed value

3%

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 15 / 23

Page 25: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

A Bacteria example

Stem pairsOther documented

interactions False positives

050

100

150

200

250

225

267

80 sequences of LSU, allpossible pairs tested

258 pairs show significantcoevolution,

225 belong to secondarystructure, 26 to tertiarystructure

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 16 / 23

Page 26: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

A Bacteria example

Stem pairsOther documented

interactions False positives

050

100

150

200

250

225

267

80 sequences of LSU, allpossible pairs tested

258 pairs show significantcoevolution,

225 belong to secondarystructure, 26 to tertiarystructure

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 16 / 23

Page 27: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Accounting for phylogeny: substitution mapping

A Bacteria example

Stem pairsOther documented

interactions False positives

050

100

150

200

250

225

267

80 sequences of LSU, allpossible pairs tested

258 pairs show significantcoevolution,

225 belong to secondarystructure, 26 to tertiarystructure

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 16 / 23

Page 28: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Detecting coevolution in proteins

The protein case

More challenging:

20 possible states instead of 4

Several biochemical propertiesto compensate for

Larger connectivity of residues

Main results:

Signal is scarcer, needs moredata to detect

Positions in close proximitytend to coevolve more thandistant positions

Relation with secondarystructure is unclear

Coevolution within domains ishigher than between domains

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 17 / 23

Page 29: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Detecting coevolution in proteins

The protein case

More challenging:

20 possible states instead of 4

Several biochemical propertiesto compensate for

Larger connectivity of residues

Main results:

Signal is scarcer, needs moredata to detect

Positions in close proximitytend to coevolve more thandistant positions

Relation with secondarystructure is unclear

Coevolution within domains ishigher than between domains

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 17 / 23

Page 30: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Detecting coevolution in proteins

The protein case

More challenging:

20 possible states instead of 4

Several biochemical propertiesto compensate for

Larger connectivity of residues

Main results:

Signal is scarcer, needs moredata to detect

Positions in close proximitytend to coevolve more thandistant positions

Relation with secondarystructure is unclear

Coevolution within domains ishigher than between domains

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 17 / 23

Page 31: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Detecting coevolution in proteins

The protein case

More challenging:

20 possible states instead of 4

Several biochemical propertiesto compensate for

Larger connectivity of residues

Main results:

Signal is scarcer, needs moredata to detect

Positions in close proximitytend to coevolve more thandistant positions

Relation with secondarystructure is unclear

Coevolution within domains ishigher than between domains

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 17 / 23

Page 32: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

Non-independence and models of evolution

Phylogenetic inference:

ML is robust to departures from independence, NJ more sensitive:

Inter-dependence artificially increases the bootstrap support:

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 18 / 23

Page 33: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A simple model of non-independent evolution: RNATillier and Collins [1998]

Model with pairs of states:4× 4 = 16 states

Model simplification,Watson-Crick pairs and GUintermediates

Results: high substitution rate,compensating mutations appearalmost simultaneously

AU

GU

GC

UA

UG

CG

OT

αs

αs αs

αs

αdαd

β

γ γ

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 19 / 23

Page 34: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A simple model of non-independent evolution: RNATillier and Collins [1998]

Model with pairs of states:4× 4 = 16 states

Model simplification,Watson-Crick pairs and GUintermediates

Results: high substitution rate,compensating mutations appearalmost simultaneously

AU

GU

GC

UA

UG

CG

OT

αs

αs αs

αs

αdαd

β

γ γ

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 19 / 23

Page 35: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A simple model of non-independent evolution: RNATillier and Collins [1998]

Model with pairs of states:4× 4 = 16 states

Model simplification,Watson-Crick pairs and GUintermediates

Results: high substitution rate,compensating mutations appearalmost simultaneously

AU

GU

GC

UA

UG

CG

OT

αs

αs αs

αs

αdαd

β

γ γ

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 19 / 23

Page 36: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A simple model of non-independent evolution: proteinsPollock et al. [1999]

Model with pairs of states: 20× 20 = two many pairs of state!

Model simplification, using sub-alphabets, for instance:I Large/SmallI Polar/Non-polarI Charged/Non-charged

Q =

AB Ab aB ab

AB −∑

AB λBπAb/πA λAπaB/πB 0Ab λBπAB/πA −

∑Ab 0 λAπab/πb

aB λAπAB/πB 0 −∑

aB λBπab/πa

ab 0 λAπab/πb λBπaB/πa −∑

ab

Computationally demanding: used only to detect significantcoevolving pairs by model comparison.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 20 / 23

Page 37: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A simple model of non-independent evolution: proteinsPollock et al. [1999]

Model with pairs of states: 20× 20 = two many pairs of state!

Model simplification, using sub-alphabets, for instance:I Large/SmallI Polar/Non-polarI Charged/Non-charged

Q =

AB Ab aB ab

AB −∑

AB λBπAb/πA λAπaB/πB 0Ab λBπAB/πA −

∑Ab 0 λAπab/πb

aB λAπAB/πB 0 −∑

aB λBπab/πa

ab 0 λAπab/πb λBπaB/πa −∑

ab

Computationally demanding: used only to detect significantcoevolving pairs by model comparison.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 20 / 23

Page 38: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A simple model of non-independent evolution: proteinsPollock et al. [1999]

Model with pairs of states: 20× 20 = two many pairs of state!

Model simplification, using sub-alphabets, for instance:I Large/SmallI Polar/Non-polarI Charged/Non-charged

Q =

AB Ab aB ab

AB −∑

AB λBπAb/πA λAπaB/πB 0Ab λBπAB/πA −

∑Ab 0 λAπab/πb

aB λAπAB/πB 0 −∑

aB λBπab/πa

ab 0 λAπab/πb λBπaB/πa −∑

ab

Computationally demanding: used only to detect significantcoevolving pairs by model comparison.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 20 / 23

Page 39: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A more complex model of non-independent evolutionRodrigue et al. [2005, 2006]

Sites are treated simultaneously: sequence of size N ⇒ 20N states!

Rxy =

0 if sequence x and y differ

by more than one position

Qkl

ep×(E(x)−E(y))

if sequence x and y have statesl and m at positions i and j

−∑

y 6=x Rxy i = j

Add a fitness function that takes into account the whole sequence.This function is computed using measures on real data: statisticalpotentials. p is a free parameter, estimated from the data.

Bayesian sampling procedure to estimate parameters.

Usage limited to small data sets due to the computational load, butprovides a better description of the data.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 21 / 23

Page 40: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A more complex model of non-independent evolutionRodrigue et al. [2005, 2006]

Sites are treated simultaneously: sequence of size N ⇒ 20N states!

Rxy =

0 if sequence x and y differ

by more than one position

Qklep×(E(x)−E(y)) if sequence x and y have states

l and m at positions i and j−

∑y 6=x Rxy i = j

Add a fitness function that takes into account the whole sequence.This function is computed using measures on real data: statisticalpotentials. p is a free parameter, estimated from the data.

Bayesian sampling procedure to estimate parameters.

Usage limited to small data sets due to the computational load, butprovides a better description of the data.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 21 / 23

Page 41: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A more complex model of non-independent evolutionRodrigue et al. [2005, 2006]

Sites are treated simultaneously: sequence of size N ⇒ 20N states!

Rxy =

0 if sequence x and y differ

by more than one position

Qklep×(E(x)−E(y)) if sequence x and y have states

l and m at positions i and j−

∑y 6=x Rxy i = j

Add a fitness function that takes into account the whole sequence.This function is computed using measures on real data: statisticalpotentials. p is a free parameter, estimated from the data.

Bayesian sampling procedure to estimate parameters.

Usage limited to small data sets due to the computational load, butprovides a better description of the data.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 21 / 23

Page 42: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

A more complex model of non-independent evolutionRodrigue et al. [2005, 2006]

Sites are treated simultaneously: sequence of size N ⇒ 20N states!

Rxy =

0 if sequence x and y differ

by more than one position

Qklep×(E(x)−E(y)) if sequence x and y have states

l and m at positions i and j−

∑y 6=x Rxy i = j

Add a fitness function that takes into account the whole sequence.This function is computed using measures on real data: statisticalpotentials. p is a free parameter, estimated from the data.

Bayesian sampling procedure to estimate parameters.

Usage limited to small data sets due to the computational load, butprovides a better description of the data.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 21 / 23

Page 43: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

Conclusions

Large effort during the last 10 years

Methodological improvements are limited by the large complexity ofstructural constraints

The large amount of sequence and structure data opens a way tocharacterize these constraints at a large scale, and maybe help thedesign a new tractable models

These studies are also a good example of tight interactions betweenevolutionary biology, molecular biology and bioinformatics.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 22 / 23

Page 44: Evolution and Structure of Biomolecules - AUusers-birc.au.dk/jdutheil/Teaching/Structure.pdf · almost all atom positions ... Improve models of sequence evolution, ... Evolution and

On the non-independence of substitution events Non-independence and models of evolution

References

J. Felsenstein and G. A. Churchill. A Hidden Markov Model approach to variation among sitesin rate of evolution. Molecular Biology and Evolution, 13:93–104, 1996.

F. Glaser, T. Pupko, I. Paz, R. E. Bell, D. Bechor-Shental, E. Martz, and N. Ben-Tal. ConSurf:identification of functional regions in proteins by surface-mapping of phylogeneticinformation. Bioinformatics, 19:163–164, 2003.

N. Goldman, J. L. Thorne, and D. T. Jones. Using evolutionary trees in protein secondarystructure prediction and other comparative sequence analyses. Journal of Molecular Biology,263:196–208, 1996.

N. Goldman, J. L. Thorne, and D. T. Jones. Assessing the impact of secondary structure andsolvent accessibility on protein evolution. Genetics, 149:445–458, 1998.

D. D. Pollock, W. R. Taylor, and N. Goldman. Coevolving protein residues: maximum likelihoodidentification and relationship to structure. Journal of Molecular Biology, 287:187–198, 1999.

N. Rodrigue, N. Lartillot, D. Bryant, and H. Philippe. Site interdependence attributed totertiary structure in amino acid sequence evolution. Gene, 347:207–217, 2005.

N. Rodrigue, H. Philippe, and N. Lartillot. Assessing site-interdependent phylogenetic models ofsequence evolution. Molecular Biology and Evolution, 23:1762–1775, 2006.

J. L. Thorne, N. Goldman, and D. T. Jones. Combining protein evolution and secondarystructure. Molecular Biology and Evolution, 13:666–673, 1996.

E. R. M. Tillier and R. A. Collins. High apparent rate of simultaneous compensatory base-pairsubstitutions in ribosomal rna. Genetics, 148:1993–2002, 1998.

Julien Dutheil (BiRC – University of Arhus) Evolution and Structure of Biomolecules February 2008 23 / 23