remote homology detection of beta-structural motifs using random fields

36
Remote Homology Detection of Beta-Structural Motifs Using Random Fields Matt Menke, Tufts Bonnie Berger, MIT Lenore Cowen, Tufts ISMB 3Dsig 2010 July 10, 2010

Upload: tory

Post on 21-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Remote Homology Detection of Beta-Structural Motifs Using Random Fields. Matt Menke, Tufts Bonnie Berger, MIT Lenore Cowen, Tufts ISMB 3Dsig 2010 July 10, 2010. Inferring structural similarity from homology is hard at the SCOP superfamily/fold level. Profile HMMs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Remote Homology Detection of Beta-Structural Motifs Using

Random Fields

Matt Menke, Tufts

Bonnie Berger, MIT

Lenore Cowen, Tufts

ISMB 3Dsig 2010

July 10, 2010

Page 2: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Inferring structural similarity from homology is hard at the SCOP superfamily/fold level

Page 3: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Profile HMMs

Page 4: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

HMM is trained from Sequence Alignment of Known Structures

But: cannot capture pariwise long-range beta-sheet interactions!

Page 5: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

HMMs cannot capture statistical preferences from residues close in space but far, and a variable distance apart in seq.

Pectate Lyase C (Yoder et al. 1993)

Page 6: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Look at Just Pairs or Generalize to Markov Random Fields

Only look at Pairs:

Generalize to Markov Random Fields

Liu et al. 2009

Zhao et al. 2010

Menke et al. 2010

(This work)

B3 T2

B2

B1

[Bradley, Cowen, Menke, King, Berger, PNAS, 2001, 98:26, 14,819-14,824 ; Cowen, Bradley, Menke, King, Berger (2002), J Comp Biol, 9, 261-276]

Page 7: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Let’s look at what this would mean for propeller folds

Page 8: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Goal: capture HMM sequence information and pairwise information in beta-structural motifs at the same time!

SCOP (http://scop.mrc-lmb.cam.ac.uk/scop

Page 9: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Structural Motifs Using Random Fields

SMURF

Page 10: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Structural Motifs Using Random Fields

Can we getthe benefitof pairwisecorrelationswithout having to throw awayall sequence info?

Page 11: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

The template is learned from solved structures in the PDB

Page 12: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

The template is learned from solved structures in the PDB:

Aligned with Matt

Page 13: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Digression: Matt structural alignment program

Menke, Berger, Cowen, (PLOS Combio 2008)

Specifically designed to align more distant homologs

AFP chaining using dynamic programming with “translations and twists”

(flexibility)

Page 14: Remote Homology Detection of Beta-Structural Motifs Using Random Fields
Page 15: Remote Homology Detection of Beta-Structural Motifs Using Random Fields
Page 16: Remote Homology Detection of Beta-Structural Motifs Using Random Fields
Page 17: Remote Homology Detection of Beta-Structural Motifs Using Random Fields
Page 18: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

The template is learned from solved structures in the PDB:

Aligned with Matt

Page 19: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Two beta tables are learned from amphapathic beta sheets that are not propellers from solved structures in the PDB.

A C D E F G H I K L M N P Q R S T V W Y

A 0.78 0.18 0.14 0.15 0.59 0.70 0.06 1.06 0.07 1.19 0.17 0.12 0.05 0.11 0.08 0.22 0.25 1.53 0.17 0.27

C 0.18 0.24 0.03 0.06 0.12 0.14 0.05 0.28 0.03 0.34 0.07 0.02 0.01 0.03 0.02 0.05 0.08 0.39 0.10 0.10

D 0.14 0.03 0.03 0.06 0.10 0.15 0.02 0.11 0.01 0.16 0.05 0.07 0.01 0.05 0.08 0.07 0.11 0.16 0.03 0.03

E 0.15 0.06 0.06 0.05 0.26 0.18 0.14 0.40 0.10 0.57 0.08 0.10 0.02 0.08 0.15 0.19 0.25 0.57 0.05 0.18

F 0.59 0.12 0.10 0.26 0.66 0.61 0.10 1.06 0.05 1.19 0.24 0.08 0.05 0.15 0.08 0.13 0.22 1.35 0.13 0.43G 0.70 0.14 0.15 0.18 0.61 0.58 0.10 0.77 0.07 1.13 0.11 0.23 0.07 0.17 0.09 0.24 0.31 1.27 0.18 0.48

H 0.06 0.05 0.02 0.14 0.10 0.10 0.04 0.13 0.02 0.13 0.04 0.05 0.01 0.01 0.02 0.06 0.09 0.23 0.03 0.07

I 1.06 0.28 0.11 0.40 1.06 0.77 0.13 2.27 0.10 2.21 0.38 0.14 0.05 0.29 0.13 0.26 0.45 2.56 0.18 0.42

K 0.07 0.03 0.01 0.10 0.05 0.07 0.02 0.10 0.03 0.16 0.03 0.04 0.00 0.05 0.01 0.05 0.05 0.17 0.02 0.10

L 1.19 0.34 0.16 0.57 1.19 1.13 0.13 2.21 0.16 2.96 0.48 0.18 0.06 0.33 0.18 0.29 0.36 2.64 0.25 0.50

M 0.17 0.07 0.05 0.08 0.24 0.11 0.04 0.38 0.03 0.48 0.10 0.01 0.01 0.03 0.04 0.06 0.07 0.49 0.08 0.06

N 0.12 0.02 0.07 0.10 0.08 0.23 0.05 0.14 0.04 0.18 0.01 0.05 0.01 0.05 0.06 0.12 0.16 0.18 0.04 0.08

P 0.05 0.01 0.01 0.02 0.05 0.07 0.01 0.05 0.00 0.06 0.01 0.01 0.01 0.01 0.01 0.02 0.02 0.09 0.02 0.04

Q 0.11 0.03 0.05 0.08 0.15 0.17 0.01 0.29 0.05 0.33 0.03 0.05 0.01 0.04 0.08 0.17 0.17 0.27 0.05 0.13

R 0.08 0.02 0.08 0.15 0.08 0.09 0.02 0.13 0.01 0.18 0.04 0.06 0.01 0.08 0.04 0.05 0.07 0.16 0.02 0.07

S 0.22 0.05 0.07 0.19 0.13 0.24 0.06 0.26 0.05 0.29 0.06 0.12 0.02 0.17 0.05 0.17 0.15 0.29 0.08 0.09

T 0.25 0.08 0.11 0.25 0.22 0.31 0.09 0.45 0.05 0.36 0.07 0.16 0.02 0.17 0.07 0.15 0.25 0.44 0.03 0.11

V 1.53 0.39 0.16 0.57 1.35 1.27 0.23 2.56 0.17 2.64 0.49 0.18 0.09 0.27 0.16 0.29 0.44 3.74 0.23 0.64

W 0.17 0.10 0.03 0.05 0.13 0.18 0.03 0.18 0.02 0.25 0.08 0.04 0.02 0.05 0.02 0.08 0.03 0.23 0.05 0.05

Y 0.27 0.10 0.03 0.18 0.43 0.48 0.07 0.42 0.10 0.50 0.06 0.08 0.04 0.13 0.07 0.09 0.11 0.64 0.05 0.10A C D E F G H I K L M N P Q R S T V W Y

A 0.27 0.04 0.13 0.28 0.22 0.18 0.11 0.31 0.23 0.38 0.06 0.11 0.06 0.13 0.22 0.28 0.37 0.49 0.06 0.25

C 0.04 0.08 0.05 0.07 0.04 0.03 0.03 0.04 0.07 0.04 0.02 0.06 0.01 0.08 0.11 0.05 0.06 0.10 0.04 0.09

D 0.13 0.05 0.09 0.13 0.09 0.08 0.13 0.08 0.71 0.12 0.06 0.22 0.03 0.15 0.50 0.36 0.41 0.24 0.02 0.12

E 0.28 0.07 0.13 0.43 0.31 0.15 0.21 0.43 1.92 0.50 0.14 0.28 0.10 0.25 1.49 0.60 1.01 0.63 0.09 0.32

F 0.22 0.04 0.09 0.31 0.23 0.16 0.12 0.34 0.28 0.32 0.12 0.14 0.06 0.19 0.29 0.27 0.34 0.38 0.13 0.33

G 0.18 0.03 0.08 0.15 0.16 0.08 0.06 0.15 0.16 0.15 0.06 0.08 0.05 0.10 0.15 0.14 0.17 0.21 0.03 0.19

H 0.11 0.03 0.13 0.21 0.12 0.06 0.06 0.08 0.25 0.12 0.04 0.10 0.07 0.11 0.14 0.19 0.20 0.21 0.05 0.14

I 0.31 0.04 0.08 0.43 0.34 0.15 0.08 0.48 0.57 0.32 0.10 0.14 0.07 0.28 0.43 0.30 0.32 0.59 0.07 0.40

K 0.23 0.07 0.71 1.92 0.28 0.16 0.25 0.57 0.63 0.38 0.15 0.46 0.08 0.42 0.33 0.70 1.17 0.71 0.22 0.52

L 0.38 0.04 0.12 0.50 0.32 0.15 0.12 0.32 0.38 0.48 0.10 0.15 0.12 0.23 0.36 0.26 0.34 0.62 0.07 0.39

M 0.06 0.02 0.06 0.14 0.12 0.06 0.04 0.10 0.15 0.10 0.12 0.09 0.04 0.08 0.10 0.12 0.14 0.10 0.02 0.08

N 0.11 0.06 0.22 0.28 0.14 0.08 0.10 0.14 0.46 0.15 0.09 0.38 0.09 0.22 0.25 0.48 0.49 0.27 0.05 0.18

P 0.06 0.01 0.03 0.10 0.06 0.05 0.07 0.07 0.08 0.12 0.04 0.09 0.02 0.06 0.07 0.07 0.13 0.13 0.02 0.16

Q 0.13 0.08 0.15 0.25 0.19 0.10 0.11 0.28 0.42 0.23 0.08 0.22 0.06 0.24 0.32 0.28 0.48 0.26 0.03 0.16

R 0.22 0.11 0.50 1.49 0.29 0.15 0.14 0.43 0.33 0.36 0.10 0.25 0.07 0.32 0.36 0.47 0.68 0.72 0.11 0.30

S 0.28 0.05 0.36 0.60 0.27 0.14 0.19 0.30 0.70 0.26 0.12 0.48 0.07 0.28 0.47 0.91 0.88 0.50 0.06 0.27

T 0.37 0.06 0.41 1.01 0.34 0.17 0.20 0.32 1.17 0.34 0.14 0.49 0.13 0.48 0.68 0.88 1.60 0.82 0.07 0.27

V 0.49 0.10 0.24 0.63 0.38 0.21 0.21 0.59 0.71 0.62 0.10 0.27 0.13 0.26 0.72 0.50 0.82 0.87 0.21 0.64

W 0.06 0.04 0.02 0.09 0.13 0.03 0.05 0.07 0.22 0.07 0.02 0.05 0.02 0.03 0.11 0.06 0.07 0.21 0.02 0.13

Y 0.25 0.09 0.12 0.32 0.33 0.19 0.14 0.40 0.52 0.39 0.08 0.18 0.16 0.16 0.30 0.27 0.27 0.64 0.13 0.38

Buried Residue

Exposed Residue

http://bcb.cs.tufts.edu/propellers/si/

Page 20: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Computing a Score

• Sequences are scored by computing their best “threading” or “parse” against the template as a sum of HMM(score) + pairwise(score)

• No longer polynomial time (multi-dimensional dynamic programming)

• Tractable on propellers because paired beta-strands don’t interleave too much

Page 21: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Let’s look at what this would mean for propeller folds

Page 22: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Let’s look at what this would mean for propeller folds

• Training set for HMM score: leave-superfamily-out cross validation

• Training set for pairwise score: amphapathic beta-sheets from NON-propellers

Page 23: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Results on Propellers

6-bladed 7-bladed

TNeg Hmmer Smurf Hmmer Smurf

97% 52 80 80 87

96% 56 80 80 87

95% 64 80 87 93

94% 68 84 90 93

93% 68 84 90 93

92% 68 88 90 97

91% 68 92 90 97

90% 68 92 93 100

Page 24: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Results on Propellers

• Note that this is “6 (or 7)” bladed propeller versus non-propeller– distinguishing the number of blades in the propeller seems to be a much harder problem….

Page 25: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Different propeller closures

1jof 2trc

Page 26: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

So: what new sequences fold into propellers?

• We predict a double propeller motif in the N-terminal region of a hybrid 2-component sensor protein.

Page 27: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

What are these proteins?

• First found in a benign bacteria in human gut. • May be involved in adapting to changes in

diet/efficiently processing different sugars• Found in other bacterial species: help sense and

adapt to environmental changes.

• Big stretch (I am not a biologist): help to study human obesity epidemic??

Page 28: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Popular Domains

• HisKA histidine kinase domain• GGDEF adenylyl cyclase signalling domain• SpoIIE sporulation domain• Gaf domain • PAS domain• HATPase domain

Page 29: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Species distribution

Page 30: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Distinguishing Number of Blades

• The automatic SMURF consensus 7-bladed template only learns 6 blades.

• Sequence motifs are similar– the same Pfam motif occurs in propellers with different numbers of blades

• The fix: throw out propellers with a “funky” 7th blade by hand and build a new template. Now 6-bladed propellers don’t like the 7-bladed template

• Double propellers we found are probably 7-7 (but 7-6 is also plausible).

Page 31: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Predict propellers with Smurf!

• http://smurf.cs.tufts.edu– Accepts sequences in FASTA format– 6,7,8-bladed templates, as well as all 9

double-propeller template

http://bcb.cs.tufts.edu/propellers/sipairwise tables

long list of predicted propeller sequences

Page 32: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

What’s Next for SMURF?

Long-range dependenciesDeeply interleaved β-strand pairs

Page 33: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Conclusions

• Combining an HMM score with a pairwise score can help recognize beta-structures

• Computing this score exactly with a random field is highly computationally intensive

• We will begin to look at when it is feasible and when we should use heuristics.

• Also: add side-chain packing, other model refinements.

Page 34: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

More Questions

• When should we over-weight the HMM versus the pair portion of the score?

-- the case of 8-bladed propellers

• Are there other ways to incorporate pairwise dependencies into HMMs?

Page 35: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

An Hmm is only as good as its training data

• An Hmm is only as good as its training data– or is it?

• Idea: we augment the training set, using the simplest model of evolution!

• See Kumar and Cowen’s ISMB proceedings paper!

Page 36: Remote Homology Detection of Beta-Structural Motifs Using Random Fields

Acknowledgements

• National Institutes of Health

Thank you!