rna: function, secondary structure prediction, search ... · sequence/function constraint for rna...
TRANSCRIPT
![Page 1: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/1.jpg)
CSE 527���Computational Biology
RNA: Function, Secondary Structure Prediction, Search, Discovery
![Page 2: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/2.jpg)
The Message
Cells make lots of RNA
Functionally important, functionally diverse
Structurally complex
New tools required
alignment, discovery, search, scoring, etc.
2
noncoding RNA
![Page 3: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/3.jpg)
RNA
DNA: DeoxyriboNucleic Acid RNA: RiboNucleic Acid
Like DNA, except:
Lacks OH on ribose (backbone sugar) Uracil (U) in place of thymine (T) A, G, C as before
4
uracil thymine
CH3
pairs with A
![Page 4: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/4.jpg)
Fig. 2. The arrows show the situation as it seemed in 1958. Solid arrows represent probable transfers, dotted arrows possible transfers. The absent arrows (compare Fig. 1) represent the impossible transfers postulated by the central dogma. They are the three possible arrows starting from protein.
![Page 5: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/5.jpg)
Ribosomes
Watson, Gilman, Witkowski, & Zoller, 1992 7
![Page 6: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/6.jpg)
Ribosomes
Atomic structure of the 50S Subunit from Haloarcula marismortui. Proteins are shown in blue and the two RNA strands in orange and yellow. The small patch of green in the
center of the subunit is the active site.!- Wikipedia!
1974 Nobel prize to Romanian biologist George Palade (1912-2008) for discovery in mid 50’s
50-80 proteins
3-4 RNAs (half the mass)
Catalytic core is RNA
Of course, mRNAs and tRNAs (messenger & transfer RNAs) are ���critical too
8
![Page 7: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/7.jpg)
Transfer RNA The “adapter” coupling mRNA ���to protein synthesis. Discovered in the mid-1950s by ���Mahlon Hoagland (1921-2009,
left), Mary Stephenson, and Paul Zamecnik (1912-2009; Lasker award winner, right).
![Page 8: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/8.jpg)
“Classical” RNAs
rRNA - ribosomal RNA (~4 kinds, 120-5k nt)
tRNA - transfer RNA (~61 kinds, ~ 75 nt)
RNaseP - tRNA processing (~300 nt) snRNA - small nuclear RNA (splicing: U1, etc, 60-300nt)
a handful of others
![Page 9: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/9.jpg)
A!G!A!C!U!G!
A!C!G!
A!U!C!A!
C!G!C!A!G!U!C!
A!
Base pairs!
A! U!C! G!
A!C! A!U!G!U!
RNA Secondary Structure: ���RNA makes helices too
11
5´! 3´!
Usually single stranded
![Page 10: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/10.jpg)
Bacteria Triumph of proteins
80% of genome is coding DNA
Functionally diverse
receptors motors catalysts regulators (Monod & Jakob, Nobel prize 1965) …
13
![Page 11: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/11.jpg)
Proteins catalyze & regulate
biochemistry
14
![Page 12: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/12.jpg)
Met Pathways
…
![Page 13: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/13.jpg)
Alberts, et al, 3e.
Gene Regulation: The MET Repressor
SAM
DNA Protein 16
![Page 14: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/14.jpg)
17
Alb
erts
, et a
l, 3e
.
The protein way
Riboswitch alternative
SAM
Grundy & Henkin, Mol. Microbiol 1998 Epshtein, et al., PNAS 2003 Winkler et al., Nat. Struct. Biol. 2003
![Page 15: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/15.jpg)
18
Alb
erts
, et a
l, 3e
.
The protein way
Riboswitch alternatives
SAM-II
SAM-I
Grundy, Epshtein, Winkler et al., 1998, 2003
Corbino et al., Genome Biol. 2005
![Page 16: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/16.jpg)
19
Alb
erts
, et a
l, 3e
.
Corbino et al., Genome Biol. 2005
The protein way
Riboswitch alternatives
SAM-III
SAM-II SAM-I
Fuchs et al., NSMB 2006
Grundy, Epshtein, Winkler et al., 1998, 2003
![Page 17: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/17.jpg)
20
Alb
erts
, et a
l, 3e
.
Corbino et al., Genome Biol. 2005
The protein way
Riboswitch alternatives
Weinberg et al., RNA 2008
SAM-III
SAM-II SAM-I
Fuchs et al., NSMB 2006
Grundy, Epshtein, Winkler et al., 1998, 2003
SAM-IV
![Page 18: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/18.jpg)
21
![Page 19: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/19.jpg)
22
![Page 20: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/20.jpg)
Example: Glycine Regulation
How is glycine level regulated? Plausible answer:
glycine cleavage enzyme gene
g g
TF g TF
gce protein
g
g
DNA
transcription factors (proteins) bind to DNA to turn nearby genes on or off
23
![Page 21: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/21.jpg)
The Glycine Riboswitch
Actual answer (in many bacteria): ���
glycine cleavage enzyme gene
g g
g g
gce mRNA
gce protein
5ʹ′ 3ʹ′
DNA
Mandal et al. Science 2004 24
![Page 22: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/22.jpg)
6S mimics an ���open promoter
Barrick et al. RNA 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NAR 2005
E.coli
Bacillus/���Clostridium
Actino-bacteria
25
![Page 23: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/23.jpg)
26 Weinberg, et al. Nucl. Acids Res., July 2007 35: 4809-4819.!
boxed = confirmed riboswitch (+2 more)
Widespread, deeply conserved, structurally sophisticated, functionally diverse, biologically important uses for ncRNA throughout prokaryotic world.
![Page 24: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/24.jpg)
Vertebrates Bigger, more complex genomes
<2% coding
But >5% conserved in sequence?
And 50-90% transcribed?
And structural conservation, if any, invisible (without proper alignments, etc.)
What’s going on?
![Page 25: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/25.jpg)
Vertebrate ncRNAs mRNA, tRNA, rRNA, … of course
PLUS:
snRNA, spliceosome, snoRNA, teleomerase, microRNA, RNAi, SECIS, IRE, piwi-RNA, XIST (X-inactivation), ribozymes, …
29
![Page 26: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/26.jpg)
MicroRNA
1st discovered 1992 in C. elegans 2nd discovered 2000, also C. elegans
and human, fly, everything between
21-23 nucleotides literally fell off ends of gels
Hundreds now known in human may regulate 1/3-1/2 of all genes
development, stem cells, cancer, infectious diseases,… 30
![Page 27: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/27.jpg)
siRNA
“Short Interfering RNA” Also discovered in C. elegans
Possibly an antiviral defense, shares machinery with miRNA pathways
Allows artificial repression of most genes in most higher organisms
Huge tool for biology & biotech
31
![Page 28: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/28.jpg)
ncRNA Characteristics
Often low levels Can come from anywhere
Sense, antisense, introns, intergenic
Often poorly conserved CDS : neutral ~ 10 : 1 vs ncRNA : neutral ~ 1.2 : 1
May suggest “transcriptional noise”
![Page 29: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/29.jpg)
Noise?
HOWEVER: Sometimes capped, spliced, polyA+
Some known ncRNAs are intronic ���(e.g. some miRNAs, all snoRNAs)
Sometimes very precisely localized ���to specific compartments, cell types, ���developmental stages, ���(esp. dev & neuronal …)
![Page 30: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/30.jpg)
Conservation?
Neutral rate underestimated?
Promoters also evolving rapidly
Sequence/function constraint for RNA ≠ CDS
Alignments are suspect away from CDS
Alignments are not optimized for RNA structure
Despite all this, there is evidence for purifying selection on ncRNA promoters, splice sites, tissue-specific expression patterns, indels, …
![Page 31: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/31.jpg)
Bottom line?
A significant number of “one-off” examples Extremely wise-spread ncRNA expression At a minimum, a vast evolutionary substrate
New technology (e.g. RNAseq) exposing more
How do you recognize an interesting one?
Conserved secondary structure
![Page 32: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/32.jpg)
Origin of Life?
Life needs
information carrier: DNA
molecular machines, like enzymes: Protein
making proteins needs DNA + RNA + proteins
making (duplicating) DNA needs proteins
Horrible circularities! How could it have arisen in an abiotic environment?
![Page 33: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/33.jpg)
Origin of Life? RNA can carry information, too
RNA double helix; RNA-directed RNA polymerase
RNA can form complex structures
RNA enzymes exist (ribozymes)
RNA can control, do logic (riboswitches)
The “RNA world” hypothesis: ���1st life was RNA-based
![Page 34: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/34.jpg)
RNA replicase���
Johnston et al., Science, 2001 39
![Page 35: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/35.jpg)
Why is RNA hard to deal with?
A!C!
U!
G!
C!
A!
G!
G!
G!
A!
G!
C!
A!A! G!
C!
G!A!
G!
G! C!C!
U!
C!
U!G!C! A!
A!
U!G!
A! C!
G!
G!U!
G!
C!A!
U!
G!A!
G!
A! G!
C!
G! U!C!U! U!U!
U!
C!
A!
A!
C!A!
C! U!G!
U!
U!A!
U!
G!
G!
A!
A!
G! U!
U!U!
G!
G!C!
U!A!
G!C!
G! U!U! C!
U!A!
G!
A!G!
C! U!
G!
U!G!
A!
C!A!
C!
U!G!
C!C!
G!
C!
G!A!
C!
G!
G! G!A!
A!
A!
G!U! A! A!
C!
G!G!
G!
C!G!G!
C!
G!
A!
G!U!
A!A!
A!
C! C!
C!
G!A!
U!C! C!C!
G!
G!U!
G!
A!
A!
U!
A!G!
C!C!
U! G!A!
A!
A!
A!
A!
C!A!
A!
A!
G!U!
A!
C!A! C!G!G!
G!
A!
U!A!C!
G!
A: Structure often more important than sequence 50
![Page 36: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/36.jpg)
The Glycine Riboswitch
Actual answer (in many bacteria): ���
glycine cleavage enzyme gene
g g
g g
gce mRNA
gce protein
5ʹ′ 3ʹ′
DNA
Mandal et al. Science 2004 51
![Page 37: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/37.jpg)
Wanted
Good structure prediction tools
Good motif descriptions/models Good, fast search tools
(“RNA BLAST”, etc.)
Good, fast motif discovery tools (“RNA MEME”, etc.)
Importance of structure makes last 3 hard
54
![Page 38: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/38.jpg)
Task 1: ���Structure Prediction���
![Page 39: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/39.jpg)
RNA Structure
Primary Structure: Sequence
Secondary Structure: Pairing
Tertiary Structure: 3D shape
56
![Page 40: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/40.jpg)
RNA Pairing
Watson-Crick Pairing
C - G ~ 3 kcal/mole
A - U ~ 2 kcal/mole
“Wobble Pair” G - U ~1 kcal/mole
Non-canonical Pairs (esp. if modified)
![Page 41: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/41.jpg)
tRNA 3d Structure
![Page 42: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/42.jpg)
tRNA - Alt. Representations
Anticodon loop
Anticodon���loop
3’ 5’
59
a.a. !
![Page 43: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/43.jpg)
tRNA - Alt. Representations
Anticodon loop
Anticodon���loop
3’ 5’ 5’
3’
60
![Page 44: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/44.jpg)
Definitions
Sequence 5’ r1 r2 r3 ... rn 3’ in {A, C, G, T}
A Secondary Structure is a set of pairs i•j s.t.
i < j-4, and no sharp turns
if i•j & i’•j’ are two different pairs with i ≤ i’, then
j < i’, or
i < i’ < j’ < j
2nd pair follows 1st, or is nested within it; ���no “pseudoknots.”
![Page 45: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/45.jpg)
RNA Secondary Structure: Examples
62
C
G G
C
A
G
U
U
U A
U A C C G G U G U A
G G
C
A
G
U
U A
C
G G
C
A
U
G
U
U A
sharp turn
crossing
ok
G
≤4 U A C C G G U U G A
base pair
C
G G
C
A
G
U
U
U A
C
A
U A C G G G G U A
U A C C G G U G U A A C
![Page 46: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/46.jpg)
Nested
Pseudoknot
Precedes
![Page 47: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/47.jpg)
Approaches to Structure Prediction
Maximum Pairing���+ works on single sequences���+ simple���- too inaccurate
Minimum Energy���+ works on single sequences���- ignores pseudoknots ���- only finds “optimal” fold
Partition Function���+ finds all folds���- ignores pseudoknots
![Page 48: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/48.jpg)
Nussinov: Max Pairing
B(i,j) = # pairs in optimal pairing of ri ... rj
B(i,j) = 0 for all i, j with i ≥ j-4; otherwise
B(i,j) = max of: B(i,j-1)
max { B(i,k-1)+1+B(k+1,j-1) | i ≤ k < j-4 and rk-rj may pair}
R Nussinov, AB Jacobson, "Fast algorithm for predicting the secondary structure of single-stranded RNA." PNAS 1980.!
![Page 49: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/49.jpg)
j Unpaired: ��� Find best pairing of ri ... rj-1
j Paired (with some k): ��� Find best ri ... rk-1 + ��� best rk+1 ... rj-1 plus 1
Why is it slow? ���Why do pseudoknots matter?
“Optimal pairing of ri ... rj”��� Two possibilities
j!
i!
j-1!
j!
k-1!
k!
i!
j-1! k+1!
![Page 50: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/50.jpg)
Nussinov: ���A Computation Order
B(i,j) = # pairs in optimal pairing of ri ... rj
B(i,j) = 0 for all i, j with i ≥ j-4; otherwise
B(i,j) = max of: B(i,j-1)
max { B(i,k-1)+1+B(k+1,j-1) | i ≤ k < j-4 and rk-rj may pair}
Time: O(n3)
K=2
3
4
5
![Page 51: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/51.jpg)
Which Pairs?
Usual dynamic programming “trace-back” tells you which base pairs are in the optimal solution, not just how many
![Page 52: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/52.jpg)
Pair-based Energy Minimization
E(i,j) = energy of pairs in optimal pairing of ri ... rj
E(i,j) = ∞ for all i, j with i ≥ j-4; otherwise
E(i,j) = min of:
E(i,j-1)
min { E(i,k-1) + e(rk, rj) + E(k+1,j-1) | i ≤ k < j-4 }
Time: O(n3)
energy of k-j pair
![Page 53: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/53.jpg)
Detailed experiments show it’s ���more accurate to model based ���on loops, rather than just pairs
Loop types 1. Hairpin loop
2. Stack
3. Bulge
4. Interior loop
5. Multiloop
Loop-based Energy Minimization
1
2
3
4
5
![Page 54: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/54.jpg)
Zuker: Loop-based Energy, I
W(i,j) = energy of optimal pairing of ri ... rj
V(i,j) = as above, but forcing pair i•j
W(i,j) = V(i,j) = ∞ for all i, j with i ≥ j-4
W(i,j) = min(W(i,j-1), ��� min { W(i,k-1)+V(k,j) | i ≤ k < j-4 } )
![Page 55: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/55.jpg)
V(i,j) != min(eh(i,j), es(i,j)+V(i+1,j-1), VBI(i,j), VM(i,j))!
VM(i,j) = min { W(i,k)+W(k+1,j) | i < k < j } !
VBI(i,j) = min { ebi(i,j,iʼ,jʼ) + V(iʼ, jʼ) | i < iʼ < jʼ < j & iʼ-i+j-jʼ > 2 }!
Time: O(n4) O(n3) possible if ebi(.) is “nice”
Zuker: Loop-based Energy, II
hairpin stack bulge/
interior multi- loop
bulge/ interior
![Page 56: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/56.jpg)
Energy Parameters
Q. Where do they come from?
A1. Experiments with carefully selected synthetic RNAs
A2. Learned algorithmically from trusted alignments/structures���[Andronescu et al., 2007]
![Page 57: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/57.jpg)
Accuracy
Latest estimates suggest ~50-75% of base pairs predicted correctly in sequences of up to ~300nt
Definitely useful, but obviously imperfect
![Page 58: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/58.jpg)
Approaches to Structure Prediction
Maximum Pairing��� + works on single sequences��� + simple��� - too inaccurate
Minimum Energy��� + works on single sequences��� - ignores pseudoknots ��� - only finds “optimal” fold
Partition Function��� + finds all folds��� - ignores pseudoknots
![Page 59: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/59.jpg)
Approaches, II
Comparative sequence analysis��� + handles all pairings (potentially incl. pseudoknots) ��� - requires several (many?) aligned, ��� appropriately diverged sequences
Stochastic Context-free Grammars���Roughly combines min energy & comparative, but no pseudoknots
Physical experiments (x-ray crystalography, NMR)
![Page 60: RNA: Function, Secondary Structure Prediction, Search ... · Sequence/function constraint for RNA ≠ CDS" Alignments are suspect away from CDS" Alignments are not optimized for RNA](https://reader033.vdocuments.net/reader033/viewer/2022060309/5f0a56847e708231d42b28bd/html5/thumbnails/60.jpg)
Summary
RNA has important roles beyond mRNA Many unexpected recent discoveries
Structure is critical to function True of proteins, too, but they’re easier to find from sequence alone due, e.g., to codon structure, which RNAs lack
RNA secondary structure can be predicted (to useful accuracy) by dynamic programming
Next: RNA “motifs” (seq + 2-ary struct) well-captured by “covariance models”
81