rna structure

Upload: lordniklaus

Post on 14-Oct-2015

18 views

Category:

Documents


0 download

DESCRIPTION

RNA Structure

TRANSCRIPT

  • Basics of RNA structure and modelingRui Alves

  • RNA functions

    Storage/transfer of genetic information

    Genomes many viruses have RNA genomessingle-stranded (ssRNA)e.g., retroviruses (HIV)double-stranded (dsRNA)

    Transfer of genetic information mRNA = "coding RNA" - encodes proteins

  • RNA functions

    Structural e.g., rRNA, which is a major structural component of ribosomes BUT - its role is not just structural, also:

    Catalytic RNA in the ribosome has peptidyltransferase activity Enzymatic activity responsible for peptide bond formation between amino acids in growing peptide chain Also, many small RNAs are enzymes "ribozymes"

  • RNA functions

    Regulatory

    Recently discovered important new roles for RNAs In normal cells: in "defense" - esp. in plants in normal developmente.g., siRNAs, miRNAAs tools: for gene therapy or to modify gene expression RNAi RNA aptamers

  • RNA types & functions

    L Samaraweera 2005

    Types of RNAsPrimary Function(s)mRNA - messenger translation (protein synthesis) regulatoryrRNA - ribosomal translation (protein synthesis) t-RNA - transfer translation (protein synthesis)hnRNA - heterogeneous nuclearprecursors & intermediates of mature mRNAs & other RNAsscRNA - small cytoplasmic signal recognition particle (SRP)tRNA processing snRNA - small nuclear snoRNA - small nucleolarmRNA processing, poly A addition rRNA processing/maturation/methylationregulatoryRNAs (siRNA, miRNA, etc.)regulation of transcription and translation, other??

  • miRNA Challenges for Computational BiologyFind the genes encoding microRNAsPredict their regulatory targets Integrate miRNAs into gene regulatory pathways & networks

    Predict RNA structureComputational Prediction of MicroRNA Genes & TargetsNeed to modify traditional paradigm of "transcriptional control" primarily by protein-DNA interactions to include miRNA regulatory mechanisms!

  • OutlineRNA primary structure

    RNA secondary structure & prediction

    RNA tertiary structure & prediction

  • Hierarchical organization of RNA moleculesPrimary structure:5 to 3 list of covalently linked nucleotides, named by the attached base

    Commonly represented by a string S over the alphabet ={A,C,G,U}

  • OutlineRNA primary structure

    RNA secondary structure & prediction

    RNA tertiary structure & prediction

  • Hierarchical organization of RNA moleculesPrimary structure:Secondary Structure5 to 3 list of covalently linked nucleotides, named by the attached baseCommonly represented by a string S over the alphabet ={A,C,G,U}List of base pairs, denoted by ij for a pairing between the i-th and j-thNucleotides, ri and rj, where i
  • RNA synthesis and foldWatson-Crick Base PairingWobble Base PairingRNA immediately starts to fold when it is synthesized

  • RNA secondary structuresSingle stranded bases within a stem are called a bulge of bulge loop ifthe single stranded bases are on only one side of the stem.

    If single stranded bases interrupt both sides of a stem, they are called an internal (interior) loop.

  • RNA secondary structure representation..(((.(((......))).((((((....)))).))....)))AGCUACGGAGCGAUCUCCGAGCUUUCGAGAAAGCCUCUAUUAGC

  • Circular representation of RNA

  • Why predicting RNA secondary structures ?Virus RNA

  • Existing computational methods for RNA structure prediction

    Comparative methods using sequence homologyBy examining a set of homologous sequence along with their covarying position, we can predict interactions between non adjacent positions in the sequence, such as base pairs, triples, etc.

  • Existing computational methods for RNA structure prediction

    Minimum energy predictive methodsTry to compute the RNA structure solely based on its nucleotide contents by minimizing the free energy of the predicted structure.

  • Existing computational methods for RNA structure prediction

    Structural Inference MethodsGiven a sequence with a known structure, we infer the structure of another sequence known to be similar to the first one by maximizing some similarity function

  • RNA structure predictionTwo primary methods for ab initio RNA secondarystructure prediction:

    Co-variation analysis (comparative sequence analysis). Takes into account conserved patterns of basepairs during evolution (more than 2 sequences)

    Minimum free-energy method. Determine structure of complementary regions that are energetically stable

  • Quantitative Measure of Co-variationMutual Information Content:fij(N1,N2) : joint frequency of the 2 nucleotides, N1 from the i-th column, and N2 from the j-th column

    fi(N): frequency in the i-th column of the nucleic acid NMaximize some function of covariation of nucleotides in a multiple alignment of RNAs Why?If two nucleotides change together from AU to GC they are likely to be a pair and the pair should be important for the RNA function

  • Co-variationG C U Ai 5/7 1/7 0 1/7

    j 1/7 5/7 1/7 0G C U AG 0 0.6 -0.4 0C 0.6 0 0 0U -0.4 0 0 0.4A 0 0 0.4 0

  • Computing RNA secondary structure: Minimum free-energy methodWorking hypothesis:

    The native secondary structure of a RNA molecule is the one with the minimum free energy

    Restrictions:No knotsNo close base pairsBase pairs: A-U, C-G and G-U

  • Tinoco-Uhlenbeck postulate:

    Assumption: the free energy of each base pair is independent of all the other pairs and the loop structures

    Consequence: the total free energy of an RNA is the sum of all of the base pair free energies

    Computing RNA secondary structure: Minimum free-energy method

  • Independent Base Pairs ApproachUse solution for smaller strings to find solutions for larger strings

    This is precisely the basic principle behind dynamic programming algorithms!

  • RNA folding: Dynamic ProgrammingThere are only four possible ways that a secondary structure of nested base pair can be constructed on a RNA strand from position i to j:i is unpaired, added on toa structure for i+1jS(i,j) = S(i+1,j)j is unpaired, added on toa structure for ij-1S(i,j) = S(i,j-1)

  • RNA folding: Dynamic Programmingi j paired, added on toa structure for i+1j-1S(i,j) = S(i+1,j-1)+e(ri,rj)i j paired, but not to each other;the structure for ij adds together structures for 2 sub regions, ik and k+1jS(i,j) = max {S(i,k)+S(k+1,j)}i
  • RNA folding: Dynamic ProgrammingSince there are only four cases, the optimal score S(i,j) is just the maximum of the four possibilities:To compute this efficiently, we need to make sure that the scores for the smaller sub-regions have already been calculated

  • RNA folding: Dynamic ProgrammingNotes:

    S(i,j) = 0 if j-i < 4: do not allow close base pairs

    Reasonable values of e are -3, -2, and -1 kcal/molefor GC, AU and GU, respectively. In the DP procedure,we use 3, 2, 1

    Build upper triangular part of DP matrix:- start with diagonal all 0- works outward on larger and larger regions- ends with S(1,n)

    Traceback starts with S(1,n), and finds optimal path thatleads there.

  • Initialisation:

    No close basepairsij1 2 3 4 5 6 7 8 9 10 11 12 13 141 234567891011121314

    AUACCCUGUGGUAUA0000U0000A0000C0000C0000C0000U0000G0000U0000G0000G0000U000A00U0

  • Initialisation:

    No close basepairsijGC 3AU 2GU 11 2 3 4 5 6 7 8 9 10 11 12 13 141 234567891011121314

    AUACCCUGUGGUAUA00000U00000A00002C00003C00000C00003U00001G00001U00002G00001G0000U000A00U0

  • Propagation:jC5.G11 :

    C5 unpaired:S(6,11) = 3

    G11 unpaired:S(5,10)=3

    C5-G11 pairedS(6,10)+e(C,G)=6

    C5 paired, G11 paired:S(5,6)+S(7,11)=1S(5,7)+S(8,11)=0S(5,8)+S(9,11)=0S(5,9)+S(10,11)=0

    1 2 3 4 5 6 7 8 9 10 11 12 13 141 234567891011121314GC 3AU 2GU 1

    AUACCCUGUGGUAUA0000002U0000023A0000235C0000333C0000036C000033U000011G000012U000022G00001G0000U000A00U0

  • Propagation:ij1 2 3 4 5 6 7 8 9 10 11 12 13 141 234567891011121314GC 3AU 2GU 1

    AUACCCUGUGGUAUA0000002356681012U000002356681010A000023556888C00003336666C0000036666C000033333U00001133G0000122U000022G00001G0000U000A00U0

  • Traceback:ij

    AUACCCUGUGGUAUA0000002356681012U000002356681010A000023556888C00003336666C0000036666C000033333U00001133G0000122U000022G00001G0000U000A00U0

  • AUACCCUGUGGUAUFinal predictionTotal free energy: -12 kcal/mol

  • Some notesComputational complexity: N3

    Does not work with tertiary structure features (would invalidate DP algorithm)

  • Other methodsBase pair partition functionsCalculate energy of all configurationsLowest energy is the predictionStatistical samplingRandomly generating structure with probability distribution = energy function distributionThis makes it more likely that lowest energy structure is foundSub-optimal sampling

  • RNA homology structure predictionMolecules with similar functions and different nucleotide sequences will form similar structures

    Correctly identifies high percentage of secondary structure pairings and a smaller number of tertiary interactions

    Primarily a manual method

  • How well do these methods perform?Energy minimization (via dynamic programming) 73% avg. prediction accuracy - single sequence2) Comparative sequence analysis97% avg. prediction accuracy - multiple sequences (e.g., highly conserved rRNAs)much lower if sequence conservation is lower &/or fewer sequences are available for alignment3) Combined - recent developments: combine thermodynamics & co-variation& experimental constraints? IMPROVED RESULTS

  • OutlineRNA primary structure

    RNA secondary structure & prediction

    RNA tertiary structure & prediction

  • Hierarchical organization of RNA moleculesPrimary structure:Secondary StructureTertiary structure:5 to 3 list of covalently linked nucleotides, named by the attached baseList of base pairs, denoted by ij for a pairing between the i-th and j-thNucleotides, ri and rj, where i
  • RNA tertiary interactionsPseudoknotKissing hairpinsHairpin-bulgeIn addition to secondary structural interactions in RNA, there are alsotertiary interactions, including: (A) pseudoknots, (B) kissing hairpins and(C) hairpin-bulge contact.Do not obey parentheses rule

  • RNA structure prediction strategies

    Requires "craft" & significant user input & insight

    Extensive comparative sequence analysis to predict tertiary contacts (co-variation) e.g., MANIP - WesthofUse experimental data to constrain model building e.g., MC-CYM - MajorHomology modeling using sequence alignment & reference tertiary structure (not many of these!)Low resolution molecular mechanics e.g., yammp - HarveyTertiary structure prediction

  • SummaryRNA primary structure

    RNA secondary structure & prediction

    RNA tertiary structure & prediction

  • Prediction programsMFOLD (Zuker) (web server)http://www.bioinfo.rpi.edu/applications/mfold/old/rna/form1.cgi

    Genebee (both comparative + energy model) (web server)http://www.genebee.msu.edu/services/rna2_reduced.html

    Vienna RNA packagehttp://www.tbi.univie.ac.at/~ivo/RNA/

    Mc-Sym (Computer Science approach)http://www-lbit.iro.umontreal.ca/mcsym

  • Useful web sites on RNAComparative RNA web sitehttp://www.rna.icmb.utexas.edu/ RNA worldhttp://www.imb-jena.de/RNA.html RNA page by Michael Sukerhttp://www.bioinfo.rpi.edu/~zukerm/rna/ RNA structure database http://www.rnabase.org/ http://ndbserver.rutgers.edu/ (nucleic acid database)http://prion.bchs.uh.edu/bp_type/ (non canonical bases)RNA structure classificationhttp://scor.berkeley.edu/ RNA visualisationhttp://ndbserver.rutgers.edu/services/download/index.html#rnaview http://rutchem.rutgers.edu/~xiangjun/3DNA/

  • Prediction programsMethods that include pseudo knots:Rivas and Eddy, JMB 285, 2053 (1999)Orland and Zee, Nucl. Phys. B 620, 456 (2002)These methods are at least N6

  • Propagation:jC5.U9 :

    C5 unpaired:S(6,9) = 0

    U10 unpaired:S(5,8)=0

    C5-U10 pairedS(6,8) +e(C,U)=0

    C5 paired, U10 paired:S(5,6)+S(7,9)=0S(5,7)+S(8,9)=01 2 3 4 5 6 7 8 9 10 11 12 13 141 234567891011121314GC 3AU 2GU 1

    AUACCCUGUUGUAUA00000U00000A00002C00003C00000C00003U00001G00001U00002U00001G0000U000A00U0

  • Only three ways to pair four segments1234123412341234

  • 12344321Only three ways to pair four segmentsPseudo-knot : usuallynot considered a secondarystructureas it is difficult topredict !!

  • Circular representation of a pseudo-knot

  • Basis for secondary structure predictionuRNA stRNA1K1stRNA2K2

  • RNA folding: Dynamic ProgrammingNotation:e(ri,rj) : free energy of a base pair joining ri and rj

    Bij: secondary structure of the RNA strand from base ri to base rj. Its energy is E(Bij)

    S(i,j) : optimal free energy associated with segment rirj S(i,j) = max E(Bij)