rna-rna interaction a biological crash course and introduction to prediction methods

Post on 19-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RNA-RNA interaction

A biological crash course and introduction to prediction methods

Part I – Biological crash course Bacteria

Plasmid copy controlPost-segregational killing systems trans-encoded chromosomal RNAs

RNA interference (gene silencing) Translation regulation

C. elegans developmental regulationmiRNA-miRNA interactions

Human telomerase

DNA vs. RNA

Bases #Strands Structure

DNA A,C,G,T 2 Double helix

RNA A,C,G,U 1 or 2 Stem-loop, pseudoknots, etc.

Gene expression

Central dogma of molecular biology

Translation

mRNA -> protein via triplet code What happens if mRNA is destroyed or

otherwise can’t be translated?

Bacteria backgrounder

Single-celled organisms Prokaryotes = no nucleus Multi-cistronic transcripts -> multiple

genes transcribed at one time, often with overlapping reading frames

Bacterial genetic information Bacterial chromosome (1)

Genome of organismRequired for life

Plasmids (2)Circular DNA moleculesDouble-stranded Independently self-replicatingNot required for life, often confer selective

advantage such as antibiotic resistance

Plasmid replication

(1),(2) – Genes encoded on plasmid (3) – Origin of Replication (ORI)

Plasmid copy control

Recall independent self-replication Copy number fluctuations are unavoidable Too many -> “runaway”, host dies Too few -> increased risk of plasmid loss

Problem: How to control copy count?

Solution: negative feedback loop mediated by RNA-RNA interaction

R1 copy control

Genes:oriR1 – origin of replicationrepA – lots of this protein product is required

for replication initiation tap – translation of protein product is required

for translation of repA proteincopA – product is antisense RNAcopB – product is a repressor protein (not

covered here)

R1 copy control (2)

copA – RNA with stem-loop structure copT – target segment of repA/tap mRNA,

also forms a stem-loop structure Single loop-loop interaction

R1 copy control (3)

R1 copy control (4)

copA RNA is unstable; it degrades If not enough plasmids are producing

copA antisense RNA (copy number is too low), more repA protein can be produced

Therefore the plasmid can replicate

Post-segregational killing systems

Plasmid self-preservation mechanism Bacterial host losing plasmid results in

host death R1 plasmid hok/sok system is the

prototype All such systems work similarly

R1 hok/sok system

hok/sok locus encodes:hok protein – “host killing”Overlapping reading frame – mok –

“modulator of killing”sok RNA – “suppressor of killer”

mok must be translated for hok to be expressed

mok cannot be translated if sok is present

R1 hok/sok system (2)

hok mRNA is extremely compactMany stem-loop structuresFlush 5’ – 3’ pairingHighly stable -> long half-lifeTranslationally inert

mok segment is both:Translationally activeAble to bind sok inhibitor RNA

R1 hok/sok system (3)

sok RNA is highly unstable Bacteria with R1 have lots of sok produced

sok binds mok, hok is not translated Bacteria which lose R1 have:

Lots of stable hok mRNAQuickly degrading sok RNA (low stability)No new sok RNA being producedhok is translated -> bacteria dies

Bacterial chromosomes

Plasmid antisense RNAs are generally cis-encoded Implies complete Watson-Crick

complementarity Bacterial chromosomes contain trans-

encoded antisense RNAsNot necessarily complete complementarity

Often stress-related control systems

oxyS/fhlA in E. coli

oxyS – RNA transcript induced by stress

fhlA – transcriptional activator site

oxyS/fhlA complex binds via two loop-loop interactions

RNA interference (RNAi)

a.k.a. post-transcriptional gene silencing Double-stranded RNAs are introduced into

the cellComplementary to mRNA for a geneDirectly introduced in a wet lab, orProduced by the cell itself

RNA interference (2)

dsRNAs are cleaved into 21-23 nt segments (“small interfering RNAs”, or siRNAs) by an enzyme called Dicer

RNA interference (3)

siRNAs are incorporated into RNA-induced silencing complex (RISC)

RNA interference (4)

Guided by base complementarity of the siRNA, the RISC targets mRNA for degradation

RNA interference – why?

Studying gene functionKnock out or inhibit a gene’s normal functionCan the organism survive?What phenotypic changes are observed?

Therapeutic suppressionE.g. cancer treatment

micro RNA (miRNA)

Gene expression regulation Created by similar process to siRNA Generally prevents binding of ribosome

Ex: C. elegans development

lin-4 and let-7 antisense RNAs Regulate larval development in C. elegans One of the two binding sites for lin-41 and

let-7 interaction:

Human telomerase Telomerase = ribonucleoprotein complex

Ribo = ribosomal/RNA associationNucleo = nuclear localizationProtein = contains a protein

Responsible for maintaining telomere length in eukaryotic chromosomes

Main components:Telomerase reverse transcriptaseHuman telomerase RNA (hTR)

Human telomerase (2)

Reverse transcriptaseTranscribes RNA to DNA (rather than the

usual DNA to RNA) Telomeres – repeated regions at the end

of eukaryotic chromosomes hTR is the template for the repeated

region

Human telomerase (3)

hTR 11-nt templating region consists of:Repeat template: CUAACCCAlignment domain: UAAC

Positions telomerase on the DNA strand Provides template for repeat region

Human telomerase (4)

Loop-loop interaction

Sometimes referred to as “kissing loops” Recall that all of the RNA-RNA interaction

discussed so far (excepting RNAi), involve loop-loop interaction

Predicting miRNA transcripts and targets involves loop structure prediction

ReferencesCouzin, J. (2002) “Breakthrough of the year – Small RNAs

make big splash.” Science 298(5602):2296-2297.

Lai, E.C., Wiel, C., and Rubin, G.M. (2004) “Complementary miRNA pairs suggest a regulatory role for miRNA:miRNA duplexes.” RNA 10(2):171-175.

Moss, E.G. (2001) “RNA interference – It’s a small RNA world.” Current Biology 11(19):R722-775.

Sharp, P.A. (2001) “RNA interference – 2001.” Genes and Development 15(5):485-90.

Shi, Y. (2003) “Mammalian RNAi for the masses.” TRENDS in Genetics 19(1):9-12.

References (2)Ueda, C.T., and Roberts, R.W. (2004) “Analysis of a long-

range interaction between conserved domains of human telomerase RNA.” RNA 10(1):139-147.

Wagner, E.G.H. and Flärdh, K. (2002) “Antisense RNAs everywhere?” TRENDS in Genetics 18(5):223-226.

Wagner, E.G.H., Altuvia, S., and Romby, P. (2002) “Antisense RNAs in bacteria and their genetic elements.” Advances in Genetics 45:361-398.

Part II – Prediction

Identifying effective siRNAsNeural network approach

Identifying targetsMammalian miRNA target prediction

Prediction of siRNAs

Sequence properties that make a good antisense RNA an effective gene inhibitor are not well understood

Most computational models consider only:RNA structure predictionMotif searches

Neural net approach

Training set: 490 known siRNA molecules Input parameters:

Base compositionmRNA:siRNA binding energy properties3’ and 5’ binding energyStructure of siRNA (hairpin energy and

quality) Target function: efficacy

Neural net approach (2)

Neural net results

14 inputs, 11 hidden units, 1 output Success rate of 92% Average prediction of 12 effective siRNAs

per 1000 base pairs Stringent (high specificity) Good for designing siRNAs for RNAi

Prediction of miRNA targets

Mammals/vertebratesLots of known miRNAsMostly unknown target genes

Initial method outlineLook at conserved miRNAsLook for conserved target sites

micro RNAs in animals

0.5-1.0% of predicted genes encode miRNAOne of the more abundant regulatory classes

Tissue-specific or developmental stage-specific expression

High evolutionary conservation

micro RNAs in plants

Finding targets in plants is relatively easy Look for mRNA transcripts with near-

perfect complementarity to known miRNAs Signal-to-noise ratio exceeds 10:1 for

Arabidopsis (model plant organism) Naïve approach in C. elegans and D.

melanogaster? No more hits than expected by random chance!

So what can we use?

Pairing to nucleotides 2-8 at the 5’ end of the miRNATarget recognition

Target regions enriched for genes involved in transcriptional regulation

Goals for algorithm

Predict 100s of miRNA targets Estimate false-positive rates Provide computational and experimental

evidence of authenticity Identify common functionality classes

other than transcriptional regulator genes

TargetScan Algorithm developed by Lewis et al 2003 Input:

miRNA that is known to be conserved across multiple organisms

Orthologous 3’ UTR sequencesCut-off values for two parametersValue for one free parameter

Output: Ranked list of candidate target genes

TargetScan (1)

Search UTRs in one organismBases 2-8 from miRNA = “miRNA seed”Perfect Watson-Crick complementarityNo wobble pairs (G-U)7nt matches = “seed matches”

TargetScan (2)

Extend seed matchesAllow G-U (wobble) pairsBoth directionsStop at mismatches

TargetScan (3)

Optimize basepairingRemaining 3’ region of miRNA35 bases of UTR 5’ to each seed matchRNAfold program (Hofacker et al 1994)

TargetScan (4)

Folding free energy (G) assigned to each putative miRNA:target interaction

Ignores initiation free energy RNAeval (Hofacker et al 1994)

Z score for each UTR (no match -> Z=1.0)

TargetScan (5)

n

k

TGkeZ1

/

n = number of seed matches in UTR (may be more than one)

Gk = free energy of miRNA:target site interaction of kth seed match

T = parameter influencing relative weighting of UTRs with few high affinity target sites against UTRs with lots of low affinity target sites (experimentally determined)

TargetScan (6)

Order UTRs by Z score Assign rank to each UTR Repeat this process for each of the other

organisms with UTR datasets

TargetScan (7)

UTR i is a predicted target if for all organisms:

Ci ZZ Ci RR

Datasets

nrMamm (mammalian – 79 sequences)Homologs in human, mouse, and pufferfish Identical between human and mouse, not

necessarily pufferfish (fugu) nrVert (vertebrate – 55 sequences)

Identical between human, mouse, and fugu Non-redundant: if multiple miRNAs had

the same seed, one representative chosen

Sample program flow

Results for nrMamm

nrMamm searched against human, mouse, and rat orthologous 3’ UTRs

451 miRNA:target interactions predicted for 400 unique genes

Average 5.7 targets per miRNA Signal:noise ratio of 3.2:1

Results for nrVert

Additional search against fugu UTRs Signal:noise ratio improves to 4.6:1 Relaxed cut-off values 115 predicted miRNA:target interactions

for 107 unique genes 2.1 putative targets per miRNA

Signal:noise ratio calculation

Signal = number of predicted targets from nrMamm dataset

Noise = number of predicted targets from randomly shuffled miRNAs

Shuffled control sequences screened to ensure preservation of relevant features – don’t underestimate the noise!

Screening control sequences

Features to consider:Expected frequency of seed matchesExpected frequency of matching to 3’ end of

miRNA (after seed extension)Observed count of seed matches in UTR

datasetsPredicted free energies for seed:match

interactions

Signal:noise results Filled bars are for

authentic miRNAs Open bars show the

mean and standard deviation for shuffled sequences

nrMamm set used for first two, nrVert used for set including fugu

Biological relevance Hypothesis: 5’ conservation of miRNAs is

important for mRNA target recognitionHighest signal:noise ratio observed when

seed positioned close to 5’ end Hypothesis: highly conserved miRNAs are

more involved in regulationHigh degree of conservation -> more

predicted targetsMembership in large miRNA family -> more

predicted targets

Experimental verification

15 predicted target sites chosenAll with known biological functionRepresentative of the entire list of candidates

11 target sites confirmedExpression of upstream ORF influenced27% false positives – close correspondance

to predicted 30% false positives

ReferencesChalk, A.M. and Sonnhammer, E.L.L. (2002)

“Computational antisense oligo prediction with a neural network model.” Bioinformatics 18(12):1567-1575.

Hofacker, I.L., Fontanta, W., Stadler, P.F., Bonhoeffer, S., Tacker, M., and Schuster, P. (1994) “Fast folding and comparison of RNA secondary structures.” Monatshefte fur Chemie 125:167-168.

Lewis, B.P., Shih, I., Jones-Rhoades, M.W., and Bartel, D.P. (2003) “Prediction of mammalian microRNA targets.” Cell 115(7):787-798.

top related