some stories about computing and riboswitches · some stories about computing and riboswitches!...

of 36 /36
Some stories about computing and riboswitches Walter L. (Larry) Ruzzo Computer Science & Engineering and Genome Sciences, Univ. of Washington Fred Hutchinson Cancer Research Center Seattle, USA http://www.cs.washington.edu/homes/ruzzo CSE 486/586, EE/BioE 423/523 11/3/2010

Upload: others

Post on 19-Mar-2020

3 views

Category:

Documents


0 download

Embed Size (px)

TRANSCRIPT

Page 1: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Some stories about computing and riboswitches!

Walter L. (Larry) Ruzzo���

Computer Science & Engineering and Genome Sciences, Univ. of Washington���

Fred Hutchinson Cancer Research Center���Seattle, USA

http://www.cs.washington.edu/homes/ruzzo

CSE 486/586, EE/BioE 423/523 11/3/2010

Page 2: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Fig. 2. The arrows show the situation as it seemed in 1958. Solid arrows represent probable transfers, dotted arrows possible transfers. The absent arrows (compare Fig. 1) represent the impossible transfers postulated by the central dogma. They are the three possible arrows starting from protein.

Page 3: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

A!G!A!C!U!G!

A!C!G!

A!U!C!A!

C!G!C!A!G!U!C!

A!

Base pairs!

A! U!C! G!

A!C! A!U!G!U!

RNA Secondary Structure: ���RNA makes helices too

3

5´! 3´!

Usually single stranded

Page 4: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Met Pathways

… (SAH)

Page 5: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Alberts, et al, 3e.

Gene Regulation: The MET Repressor

SAM

DNA Protein 5

Page 6: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

6

Alb

erts

, et a

l, 3e

.

Protein way

Riboswitch alternative

SAM

Grundy & Henkin, Mol. Microbiol 1998 Epshtein, et al., PNAS 2003 Winkler et al., Nat. Struct. Biol. 2003

Not the only way!

Page 7: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

7

Alb

erts

, et a

l, 3e

.

Protein way

Riboswitch alternatives

SAM-II

SAM-I

Grundy, Epshtein, Winkler et al., 1998, 2003

Corbino et al., Genome Biol. 2005

Not the only way!

Page 8: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

8

Alb

erts

, et a

l, 3e

.

Corbino et al., Genome Biol. 2005

Protein way

Riboswitch alternatives

SAM-III

SAM-II SAM-I

Fuchs et al., NSMB 2006

Grundy, Epshtein, Winkler et al., 1998, 2003

Not the only way!

Page 9: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

9

Alb

erts

, et a

l, 3e

.

Corbino et al., Genome Biol. 2005

Protein way

Riboswitch alternatives

Weinberg et al., RNA 2008

SAM-III

SAM-II SAM-I

Fuchs et al., NSMB 2006

Grundy, Epshtein, Winkler et al., 1998, 2003

SAM-IV

Not the only way!

Page 10: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

10

Alb

erts

, et a

l, 3e

.

Protein way

Riboswitch alternatives

Corbino et al.,

Genome Biol. 2005

Weinberg et al.,

RNA 2008

SAM-III

SAM-II SAM-I

Fuchs et al., NSMB 2006

Grundy, Epshtein, Winkler

et al., 1998, 2003

SAM-IV

Not the only way!

Meyer, etal., BMC Genomics 2009

Page 11: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

11

(SAH)

Page 12: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

12

Page 13: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

13 Source: R. Breaker

Page 14: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Translational ���Control

Transcriptional ���Control

Page 15: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Detail of Translational Control

Page 16: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

gcvT ORF

5’

3’

(Mandal, Lee, Barrick, Weinberg, Emilsson,

Ruzzo, Breaker, Science 2004)!

The Glycine Riboswitch���

Page 17: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Fig. 3. Cooperative binding of two glycine molecules by the VC I-II RNA. Plot depicts the fraction of VC II (open) and VC I-II (solid) bound to ligand versus the concentration of glycine. The constant, n, is the Hill

coefficient for the lines as indicated that best fit the aggregate data from four different regions (fig. S3). Shaded boxes demark the dynamic range (DR) of glycine concentrations needed by the RNAs to

progress from 10%- to 90%-bound states.

Cooperativity → ���10x sharper switch

Page 18: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Riboswitches

UTR structure that directly senses/binds small molecules & regulates mRNA

widespread in prokaryotes some in eukaryotes & archaea ~ 20 ligands known; multiple nonhomologous solutions for

some (e.g. SAM) dozens to hundreds of instances of each on/off; transcription/translation; splicing; combinatorial

control all found since ~2003; most via bioinformatics

Page 19: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Why is RNA hard to deal with?

A!C!

U!

G!

C!

A!

G!

G!

G!

A!

G!

C!

A!A! G!

C!

G!A!

G!

G! C!C!

U!

C!

U!G!C! A!

A!

U!G!

A! C!

G!

G!U!

G!

C!A!

U!

G!A!

G!

A! G!

C!

G! U!C!U! U!U!

U!

C!

A!

A!

C!A!

C! U!G!

U!

U!A!

U!

G!

G!

A!

A!

G! U!

U!U!

G!

G!C!

U!A!

G!C!

G! U!U! C!

U!A!

G!

A!G!

C! U!

G!

U!G!

A!

C!A!

C!

U!G!

C!C!

G!

C!

G!A!

C!

G!

G! G!A!

A!

A!

G!U! A! A!

C!

G!G!

G!

C!G!G!

C!

G!

A!

G!U!

A!A!

A!

C! C!

C!

G!A!

U!C! C!C!

G!

G!U!

G!

A!

A!

U!

A!G!

C!C!

U! G!A!

A!

A!

A!

A!

C!A!

A!

A!

G!U!

A!

C!A! C!G!G!

G!

A!

U!A!C!

G!

A: Structure often more important than sequence 19

Page 20: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

A!G!A!C!U!G!

A!C!G!

A!U!C!A!

C!G!C!A!G!U!C!

A!A!C! A!U!

RNA Secondary Structure: can be fixed while sequence evolves

20

A!G!C!C!A!A!

A!C!C!

A!U!C!A!

G!G!U!U!G!G!C!

A!A!C! A!U!

G-U

Page 21: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Impact of RNA homology search!

B. subtilis!

L. innocua!

A. tumefaciens!

V. cholera!

M. tuberculosis!

(Barrick, et al., 2004)!

(and 19 more species)!

operon!glycine

riboswitch!

Page 22: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Impact of RNA homology search!

B. subtilis!

L. innocua!

A. tumefaciens!

V. cholera!

M. tuberculosis!

(Barrick, et al., 2004)!

(and 19 more species)!

operon!glycine

riboswitch!

(and 42 more species)!

Using our techniques, we found…!

Page 23: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

23

mRNA leader

mRNA leader switch?

Page 24: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

24 Covariation is strong evidence for base pairing

Page 25: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Alignment Matters

Structural conservation ≠ Sequence conservation Alignment without structure information is unreliable

CLUSTALW alignment of SECIS elements with flanking regions

same-colored boxes should be aligned

25

Page 26: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

26

How to model an RNA “Motif”?

Conceptually, start with a profile HMM: from a multiple alignment, estimate nucleotide/ insert/delete

preferences for each position given a new seq, estimate likelihood that it could be generated by

the model, & align it to the model

all G mostly G del ins

Page 27: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

27

How to model an RNA “Motif”?

Add “column pairs” and pair emission probabilities for base-paired regions

paired columns <<<<<<< >>>>>>>

Page 28: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Covariance Models���(specialized stochastic CFGs)

S1 cS2g | aS2u!S2 a!

CM!

CAG or AAU!Sequences!

Example!parse of CAG!

S1 cS2g cag!

C!G!A

A!U!A

29

Page 29: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Stochastic context-free grammar

S1 cS2g | aS2u!S2 a!

CM!

Example!parse of CAG!

S1 cS2g cag!

P=0.25! P=0.75!

P=1!

0.25! 1!×! =! 0.25!

Classification!Classification! Is Pr(parse of CAG) ≥ threshold!(e.g., vs Pr(CAG in null model)!

30

Page 30: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Application: ���cis-regulatory ncRNA discovery in prokaryotes

Key issue is��� exploiting prior knowledge ���to focus on promising data

Page 31: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

32

CMFinder Simultaneous alignment, folding & motif description ���

Yao, Weinberg & Ruzzo, Bioinformatics, 2006

Folding predictions

Smart heuristics

Candidate alignment CM

Realign

EM

Mutual Information

Combines folding & mutual information in a principled way.

Page 32: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

34

CMfinder Accuracy���(on Rfam families with flanking sequence)

/CW /CW

Page 33: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

35

Use the Right Data;���Do Genome Scale Search

Dataset collection CMfinder Ravenna

Search

Right Data: • 5-10 examples amidst 20

extraneous ones OK; ���(but not 5 in 200 or 2000)

• length 1k (not 100k) How: • Regulators near regulatees • Get UTRs of homologs

Genome Scale Search: • Many riboswitches are present in ~5

copies per genome • More examples = better model +

clues to function

Page 34: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Processing Times Input from ~70 complete Firmicute genomes available in late 2005-early 2006, totaling ~200 megabases

36

2946 CDD groups

35975 motifs

1740 motifs

1466 motifs

Retrieve upstream sequences

Motif postprocessing

Identify CDD group members < 10 CPU days

Motif postprocessing

Footprinter ranking < 10 CPU days

1 ~ 2 CPU months CMfinder

RaveNnA 10 CPU months

CMfinder refinement < 1 CPU month

Page 35: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

37 Weinberg, et al. Nucl. Acids Res., July 2007 35: 4809-4819.!

boxed = confirmed riboswitch (+2 more: • S-adenosyl ��� methionine • molybdenum��� cofactor)

queuosine precursor

S-adenosyl ���homocystein

cyclic ���di-GMP

Page 36: Some stories about computing and riboswitches · Some stories about computing and riboswitches! Walter L. (Larry) Ruzzo! Computer Science & Engineering and Genome Sciences, Univ

Riboswitch Summary

RNA elements that control (“switch”) gene expression, without involvement of (transcription factor) proteins

Varied mechanism: Transcriptional, translational, on, off, combinatorial... Aptamer/expression platform.

Large diversity: Dozens of ligands, multiple aptamers for some, many operons, hundreds of species

Computationally challenging search/discovery Many open problems!

38