computational prediction of efficient splice sites for trans-splicing...

13
METHOD Computational prediction of efficient splice sites for trans-splicing ribozymes DARIO MELUZZI, 1,2 KAREN E. OLSON, 1 GREGORY F. DOLAN, 1 GAURAV ARYA, 2,3 and ULRICH F. MU ¨ LLER 1,3 1 Department of Chemistry and Biochemistry, University of California, San Diego, California 92093, USA 2 Department of NanoEngineering, University of California, San Diego, California 92093, USA ABSTRACT Group I introns have been engineered into trans-splicing ribozymes capable of replacing the 39-terminal portion of an external mRNA with their own 39-exon. Although this design makes trans-splicing ribozymes potentially useful for therapeutic application, their trans-splicing efficiency is usually too low for medical use. One factor that strongly influences trans-splicing efficiency is the position of the target splice site on the mRNA substrate. Viable splice sites are currently determined using a biochemical trans-tagging assay. Here, we propose a rapid and inexpensive alternative approach to identify efficient splice sites. This approach involves the computation of the binding free energies between ribozyme and mRNA substrate. We found that the computed binding free energies correlate well with the trans-splicing efficiency experimentally determined at 18 different splice sites on the mRNA of chloramphenicol acetyl transferase. In contrast, our results from the trans-tagging assay correlate less well with measured trans-splicing efficiency. The computed free energy components suggest that splice site efficiency depends on the following secondary structure rearrangements: hybridization of the ribozyme’s internal guide sequence (IGS) with mRNA substrate (most important), unfolding of substrate proximal to the splice site, and release of the IGS from the 39-exon (least important). The proposed computational approach can also be extended to fulfill additional design requirements of efficient trans-splicing ribozymes, such as the optimization of 39-exon and extended guide sequences. Keywords: group I intron; ribozyme; trans-splicing; secondary structure; free energy INTRODUCTION Group I introns are catalytic RNAs (ribozymes) that excise themselves from primary transcripts (Kruger et al. 1982). These ribozymes have been well-characterized both bio- chemically and structurally, with five high-resolution crys- tal structures solved to date (Adams et al. 2004; Guo et al. 2004; Golden et al. 2005; Lipchock and Strobel 2008). Furthermore, it has been possible to convert the Tetrahy- mena cis-splicing group I intron into an artificial trans- splicing ribozyme that can modify the sequence of an arbitrary mRNA substrate. This was achieved by removing the 59 portion of the wild-type Tetrahymena group I intron, thus exposing the intron’s internal guide sequence (IGS) at the 59 terminus (Inoue et al. 1985; Sullenger and Cech 1994). The resulting ribozyme can hybridize its IGS to a complementary target site on a substrate mRNA, forming a helix equivalent to the P1 duplex in the wild- type intron (Been and Cech 1986; Waring et al. 1986). The ribozyme then catalyzes cleavage of the substrate at the target site and transfer of the ribozyme 39-exon to the remaining 59 portion of the substrate (Zarrinkar and Sullenger 1998). The ability to replace a portion of a substrate mRNA with the 39-exon carried by the ribozyme has created potential roles for trans-splicing ribozymes in therapeutic applications (Fiskaa and Birgisdottir 2010). These ribo- zymes could be used to treat genetic disorders by repairing the sequence of a mutated mRNA (Sullenger and Cech 1994; Lan et al. 1998; Phylactou et al. 1998), to selectively kill cancer cells by splicing a sequence that encodes a toxic peptide into cancer-specific mRNAs (Ayre et al. 1999; Kwon et al. 2005; Lee et al. 2010), and to kill virally infected cells with the same strategy (Ayre et al. 1999; Ryu et al. 2003; Carter et al. 2010). However, technical difficulties have so far prevented the medical use of trans-splicing ribozymes. One problem is the localized and efficient delivery of therapeutic RNAs to the affected tissues (Guo et al. 2010; Gao et al. 2011), though solutions involving viral vectors and modified Salmonella strains appear to be promising (Thomas et al. 2003; Bai et al. 2011). Another problem of trans-splicing ribozymes is 3 Corresponding authors. E-mail [email protected]. E-mail [email protected]. Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.029884.111. 590 RNA (2012), 18:590–602. Published by Cold Spring Harbor Laboratory Press. Copyright Ó 2012 RNA Society. Cold Spring Harbor Laboratory Press on February 15, 2012 - Published by rnajournal.cshlp.org Downloaded from

Upload: others

Post on 02-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

METHOD

Computational prediction of efficient splice sitesfor trans-splicing ribozymes

DARIO MELUZZI12 KAREN E OLSON1 GREGORY F DOLAN1 GAURAV ARYA23 and ULRICH F MULLER13

1Department of Chemistry and Biochemistry University of California San Diego California 92093 USA2Department of NanoEngineering University of California San Diego California 92093 USA

ABSTRACT

Group I introns have been engineered into trans-splicing ribozymes capable of replacing the 39-terminal portion of an externalmRNA with their own 39-exon Although this design makes trans-splicing ribozymes potentially useful for therapeuticapplication their trans-splicing efficiency is usually too low for medical use One factor that strongly influences trans-splicingefficiency is the position of the target splice site on the mRNA substrate Viable splice sites are currently determined usinga biochemical trans-tagging assay Here we propose a rapid and inexpensive alternative approach to identify efficient splicesites This approach involves the computation of the binding free energies between ribozyme and mRNA substrate We foundthat the computed binding free energies correlate well with the trans-splicing efficiency experimentally determined at 18different splice sites on the mRNA of chloramphenicol acetyl transferase In contrast our results from the trans-tagging assaycorrelate less well with measured trans-splicing efficiency The computed free energy components suggest that splice siteefficiency depends on the following secondary structure rearrangements hybridization of the ribozymersquos internal guidesequence (IGS) with mRNA substrate (most important) unfolding of substrate proximal to the splice site and release of the IGSfrom the 39-exon (least important) The proposed computational approach can also be extended to fulfill additional designrequirements of efficient trans-splicing ribozymes such as the optimization of 39-exon and extended guide sequences

Keywords group I intron ribozyme trans-splicing secondary structure free energy

INTRODUCTION

Group I introns are catalytic RNAs (ribozymes) that excisethemselves from primary transcripts (Kruger et al 1982)These ribozymes have been well-characterized both bio-chemically and structurally with five high-resolution crys-tal structures solved to date (Adams et al 2004 Guo et al2004 Golden et al 2005 Lipchock and Strobel 2008)Furthermore it has been possible to convert the Tetrahy-mena cis-splicing group I intron into an artificial trans-splicing ribozyme that can modify the sequence of anarbitrary mRNA substrate This was achieved by removingthe 59 portion of the wild-type Tetrahymena group Iintron thus exposing the intronrsquos internal guide sequence(IGS) at the 59 terminus (Inoue et al 1985 Sullenger andCech 1994) The resulting ribozyme can hybridize its IGSto a complementary target site on a substrate mRNAforming a helix equivalent to the P1 duplex in the wild-

type intron (Been and Cech 1986 Waring et al 1986)The ribozyme then catalyzes cleavage of the substrate atthe target site and transfer of the ribozyme 39-exon to theremaining 59 portion of the substrate (Zarrinkar andSullenger 1998)

The ability to replace a portion of a substrate mRNAwith the 39-exon carried by the ribozyme has createdpotential roles for trans-splicing ribozymes in therapeuticapplications (Fiskaa and Birgisdottir 2010) These ribo-zymes could be used to treat genetic disorders by repairingthe sequence of a mutated mRNA (Sullenger and Cech1994 Lan et al 1998 Phylactou et al 1998) to selectivelykill cancer cells by splicing a sequence that encodes a toxicpeptide into cancer-specific mRNAs (Ayre et al 1999 Kwonet al 2005 Lee et al 2010) and to kill virally infected cellswith the same strategy (Ayre et al 1999 Ryu et al 2003Carter et al 2010)

However technical difficulties have so far prevented themedical use of trans-splicing ribozymes One problem isthe localized and efficient delivery of therapeutic RNAs tothe affected tissues (Guo et al 2010 Gao et al 2011) thoughsolutions involving viral vectors and modified Salmonellastrains appear to be promising (Thomas et al 2003 Baiet al 2011) Another problem of trans-splicing ribozymes is

3Corresponding authorsE-mail ufmullerucsdeduE-mail garyaucsdeduArticle published online ahead of print Article and publication date are

at httpwwwrnajournalorgcgidoi101261rna029884111

590 RNA (2012) 18590ndash602 Published by Cold Spring Harbor Laboratory Press Copyright 2012 RNA Society

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

their low efficiency in cells Here trans-splicing efficiencydenotes the fraction of target mRNA that is converted toproduct by the ribozymes Trans-splicing efficiency is typi-cally 10 or less in cells (Fiskaa and Birgisdottir 2010)Although efficiencies of up to 50 have been reported (Byunet al 2003) the necessary intracellular ribozyme concentra-tions appear too high to be acceptable in a clinical settingTherefore an outstanding task for the development of trans-splicing ribozymes is to increase their efficiency allowinga sufficient fraction of mRNA substrate to be trans-spliced atlow ribozyme concentrations

The trans-splicing efficiency varies strongly with thelocation of the splice site within the mRNA substrate(Campbell and Cech 1995 Lan et al 2000) In particularsecondary structures within the substrate can render poten-tial target sites inaccessible to the ribozyme The trans-splicing efficiency may also vary between splice sites owingto different stabilities of the P1 duplex formed betweenribozyme and substrate and to different base-pairing in-teractions between the ribozyme IGS and the ribozyme 39-exon Thus the identification of efficient splice sites iethose sites that mediate a high trans-splicing efficiencyis a nontrivial problem in the design of trans-splicingribozymes

An elegant experimental method known as the trans-tagging assay has been developed to identify efficient splicesites (Jones et al 1996) In this assay the mRNA substrate isincubated in vitro with a pool of trans-splicing ribozymeswhose IGS is randomized This pool of ribozymes is thusable to recognize any potential splice site on the substrateRibozymes targeting efficient splice sites are able to trans-splice at those sites The resulting products are detectedby reverse transcription subcloning and sequencing Thesequences from several clones are then used to deduce thepositions of efficient splice sites (Jones et al 1996 Lan et al1998 Einvik et al 2004) This trans-tagging assay has beensuccessfully appliedmdashwithout explicitly reported limita-tionsmdashto uncover efficient splice sites on various mRNAsubstrates (Lan et al 1998 Watanabe and Sullenger 2000Rogers et al 2002 Park et al 2003 Ryu and Lee 2003 Jungand Lee 2005 Fiskaa et al 2006 Jung and Lee 2006 Kimet al 2007)

The problem of identifying efficient splice sites for trans-splicing ribozymes is in concept similar to that of findingaccessible or optimal binding sites on mRNA substratestargeted by antisense DNA oligonucleotides (Chan et al2006) siRNAs (Shao et al 2007a Walton et al 2010) andother non-protein-coding RNAs (Pichon and Felden 2008)In these cases several computational methods have beenproposed and found to be effective at predicting the rel-evant target sites (Ding and Lawrence 2001 Wang andDrlica 2004 Far et al 2005 Shao et al 2006 Tafer et al2008 Backofen and Hess 2010 Thomas et al 2010) Onerecurring theme is the use of RNA secondary structureprediction algorithms to calculate the free energy changes

DGbind associated with binding an oligonucleotide to eachpossible target site on the mRNA substrate Target siteswith strongly negative values of DGbind are then predictedto be the most accessible sites on the substrate (Mathewset al 1999 Walton et al 1999 Busch et al 2008 Lu andMathews 2008 Tafer et al 2008)

Here we have adapted this approach of computingDGbind to the problem of finding efficient splice sites fortrans-splicing ribozymes The mRNA of chloramphenicolacetyl transferase (CAT) was used as a model substratefor both computations and experiments In particular wecomputed DGbind for all 186 splice sites on this substrateand we experimentally determined the trans-splicing effi-ciency in vitro for 18 of those splice sites Furthermore tomap the accessible splice sites on CAT mRNA we carriedout the trans-tagging assay which differs substantially fromdirect measurements of trans-splicing efficiency Surpris-ingly the experimental trans-splicing efficiencies at the 18tested splice sites were found to correlate better with thecomputed DGbind values than with the results of the trans-tagging assay These results suggest that the proposedcalculation of DGbind could be used to predict efficient splicesites for trans-splicing ribozymes more quickly cheaplyand accurately than with the presently used experimentalmethods

RESULTS

Calculation of DGbind for trans-splicing ribozymes

We defined DGbind as the free energy change associatedwith the binding of the ribozymersquos IGS to a given splice siteon the mRNA substrate Our hypothesis was that a verynegative DGbind corresponds to a high trans-splicing effi-ciency To calculate DGbind the binding process was mod-eled as consisting of three idealized molecular events thatinvolve only local changes in RNA secondary structure theunfolding of the target site on the substrate the release ofthe IGS on the ribozyme and the hybridization betweentarget site and IGS to form the P1 duplex (Fig 1A) Thecorresponding component free energy changes were denotedby DGunfold-target DGrelease-IGS and DGhybrid respectively Bylsquolsquoreleasersquorsquo of the IGS we mean the breaking up of any basepairs that may form between the IGS and the 39-exon Thisrelease is necessary to make the IGS available for binding tothe target site We thus computed DGbind by summing theabove components ie DGbind = DGunfold-target + DGrelease-IGS

+ DGhybrid These components can in turn be computedusing several available RNA folding algorithms whichpredict the possible secondary structures for a given RNAsequence as well as the free energy changes of those struc-tures relative to the unfolded state (Condon and Jabbari2009)

To compute the components of DGbind for trans-splicingribozymes that interact with a long mRNA substrate we

Computational prediction of splice sites

wwwrnajournalorg 591

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

adapted the ribozyme and mRNA sequences before sub-mission to RNA folding algorithms The 3D model of theTetrahymena ribozyme (Lehnert et al 1996) suggests thatbefore pairing to an mRNA target site the IGS is stericallyallowed to base-pair with the 39-exon but does not base-pair with the body of the folded intron (Fig 1B) Thus tocalculate DGrelease-IGS we used a ribozyme sequence inwhich the intron portion downstream from the IGS wasreplaced by a 5-nt linker region (Fig 1C) This shortenedsequence allowed the IGS to interact with the 39-exonwhile permitting rapid computation of DGrelease-IGS byRNA folding algorithms To choose an appropriate linkersequence we noted that in the Tetrahymena group Iintron the IGS is followed by three unpaired adenosinesand the 39-exon is preceded by one unpaired guanosine(Cech et al 1994) Although these nucleotides seem naturalcandidates for inclusion in the linker the 3D structure ofthe ribozyme shows that these nucleotides are conforma-tionally prohibited from pairing with the substrate andfrom engaging in p-stacking interactions with the IGS or

the 39-exon (Fig 1B) Therefore we chose the linker sequenceto consist of five unspecified nucleotides which are effectivelyignored by RNA folding algorithms in the calculation ofDGrelease-IGS and DGhybrid Varying the length of the linkerfrom four to seven unspecified nucleotides did not signifi-cantly change the computed DG values (data not shown)

To predict DGunfold-target from the local secondarystructure around each splice site on a long mRNA sub-strate we directed the RNA folding algorithms to analyzesubsets or windows of the entire substrate sequence Thusonly the sequence within each window containing a givensplice site was used to calculate DGunfold-target for that siteFor the 678-nt sequence of our model mRNA (see below)the resulting values of DGunfold-target were found to vary bygt1 kcalmol over different window sizes (data not shown)This variation may be due to the somewhat arbitraryexclusion of base pairs involving nucleotides outside ofa given window To minimize the influence of windowsize on the predicted DGbind we computed DGunfold-target

using window sizes of 100 200 300 400 500 and 600 ntand then we averaged the resulting values of DGbind

Splice sites chosen on a model mRNA substrateCAT mRNA

As a model mRNA substrate for our computational andexperimental investigations of trans-splicing efficiency wechose the mRNA of chloramphenicol acetyl transferaseThis mRNA sequence is 678 nt long and contains 186uridines downstream from the AUG translation start codonEach of these uridines represents a potential splice site fortrans-splicing ribozymes (Doudna et al 1989) We thuscomputed DGbind for each of these splice sites Figure 2A

FIGURE 1 Interactions within and between the mRNA substrate andthe ribozyme and their treatment during computation (A) Schematicof the substrate binding step in trans-splicing The mRNA substrate isillustrated as a hairpin structure and the body of the ribozyme isillustrated as a gray oval Unfolding of the target site on the mRNA(first step) and release of the internal guide sequence (IGS) (secondstep) allow the mRNA target site and the IGS to hybridize (third step)(B) The 3D structure of the Tetrahymena ribozyme (Lehnert et al1996) indicates that the AAA linker (black) is not stacked on the P1duplex Additionally the 39-exon (gray) is close enough to the IGS(black) for interactions if the mRNA (gray) is absent (C) To facilitatecomputational treatment of the binding process the body of theribozyme was replaced by a linker sequence joining the IGS to the 39-exon This linker does not include the natural AAA linker but consistsof five unspecified nucleotides (N) which are ignored in thecalculation of base-pairing and p-stacking interactions

FIGURE 2 Computed values of DGbind for splice sites on CATmRNA (A) Plot of DGbind against splice site position relative to theadenosine (position 1) of the AUG transcription start codon Onlysplice sites downstream from this codon were mapped by the trans-tagging assay in this study Circles with or without a diamond insideindicate splice sites chosen for experimental determination of trans-splicing efficiency Diamonds indicate splice sites found among 66product sequences obtained by trans-tagging assay (B) Distributionof the 186 splice sites (gray bars) of CAT mRNA over the computedvalues of DGbind Black bars represent the 18 splice sites that werechosen for experimental testing The topmost bar represents 61 splicesites and is shown truncated

Meluzzi et al

592 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

plots the resulting DGbind values as a function of splice siteposition relative to the adenine of the AUG start codonThese values are distributed over the range from 0 to 65kcalmol with most values closer to 0 kcalmol (Fig 2B) Inparticular only seven out of the 186 available splice sites hadDGbind lt 40 kcalmol suggesting that this value of DGbind

could be used as a threshold to predict efficient splice sitesTo determine experimentally whether the computed

values of DGbind can be used to identify efficient splice siteswe chose a set of 18 different target splice sites on CATmRNA To reveal possible correlations between trans-splic-ing efficiency and DGbind as well as between trans-splicingefficiency and position on mRNA substrate these 18 siteswere chosen to cover the full range of DGbind values (Fig 2B)and were distributed over the length of the CAT mRNA (Fig2A) Two splice sites with DGbind slightly below 40 kcalmol were omitted in order to cover the wide range of thepredicted values for DGbind with the available resources

Experimental trans-splicing efficiencieson CAT mRNA

To measure experimentally the trans-splicing efficiency ofthe 18 chosen splice sites we generated 18 trans-splicingribozymes that contained different IGSs but were otherwiseidentical Each IGS was designed to target a particular oneof the chosen splice sites (Supplemental Table S1) Each ofthe 18 ribozymes was incubated with 59-radiolabeled CATmRNA and the reaction products were analyzed by poly-acrylamide gel electrophoresis and autoradiography (Fig3A) As a measure of trans-splicing efficiency we quantifiedthe fraction of radiolabeled substrates converted to trans-splicing products after 4 h of incubation Bands consistentwith the expected trans-splicing products were observed fornine of the 18 tested splice sites namely for splice sitesat uridines 83 87 97 197 222 258 350 405 and 448Product fractions for these splice sites ranged from 04 ofthe substrate at uridine 222 to 143 at uridine 258(Supplemental Table S1) Nine of the 18 splice sites didnot yield quantifiable products

The experimental trans-splicing efficiencies were foundto agree with the computed DGbind values (Fig 3B) Inparticular the efficiency of splice sites with DGbind gt 4kcalmol was relatively low reaching at most 22 On theother hand the efficiency of splice sites with DGbind lt 4kcalmol increased steadily with increasingly more negativevalues of DGbind reaching 143 at DGbind = 61 kcalmolOnly the five most efficient splice sites had a computedDGbind lt 40 kcalmol again suggesting that efficient splicesites could be predicted by comparing the computed DGbind

to the empirical threshold of DGbind 40 kcalmolTo test the identity of the observed trans-splicing

products we reverse-transcribed cloned and sequencedthe products of all nine reactions that had resulted inquantifiable bands on polyacrylamide gels (sequence data

not shown) The expected product sequences were obtainedfor eight of the nine tested splice sites Product sequences forsplice site 350 which did not yield the expected productindicated that the corresponding ribozyme had splicedprimarily at uridine 307 Therefore the actual trans-splicingefficiency at splice site 350 may be lower than its measuredtrans-splicing efficiency (22) possibly improving furtherthe correlation between computed DGbind and trans-splicingefficiency seen in Figure 3B In summary the sequencing

FIGURE 3 Trans-splicing efficiency measured on 18 splice sites onCAT mRNA (A) Autoradiogram of reaction products from trans-splicing reactions with radiolabeled substrate Lanes are labeled withthe targeted splice sites or with () where ribozyme was absent RNAmarker sizes in nucleotides are indicated The unreacted substrate hada length of 678 nucleotides whereas the length of reaction interme-diates (i) and trans-splicing products () varied with the splice sitesNote that the ribozymes targeting splice sites 405 and 448 also splicedat each otherrsquos splice site (B) Correlation between computed bindingfree energies and experimental trans-splicing efficiencies Each di-amond represents a specific splice site Splice sites 131 240 and 369have DGbind $ 0 kcalmol and are represented by a white diamond atthe origin The data points near or over the DGbind-axis around thevalue of 3 kcalmol correspond from left to right to splice sites 222518 273 346 325 378 and 551 The thick gray line represents a least-mean-squares exponential fit y = A exp(B x) to the data points witha coefficient of determination R2 = 087 Alternatively a linear fit(data not shown) to the same data points yields a correlationcoefficient R = 075 with a probability p = 000033 that the valuesare not correlated Horizontal error bars are standard deviations overthree to six different window sizes (100ndash600 nt) Vertical error barsare standard deviations from three independent experiments

Computational prediction of splice sites

wwwrnajournalorg 593

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

results confirmed all but one of the nine products observedin the trans-splicing reactions

Experimental trans-splicing efficiency with unstructuredRNA substrates

To confirm that all of the 18 ribozymes employed in ex-periments with CAT mRNA were catalytically active wemeasured their trans-splicing efficiency on short substratesthat were designed to prevent formation of secondary struc-ture Hence this experiment also allowed us to examine theeffects of substrate secondary structure on trans-splicingefficiency Each short substrate consisted of a 13-nt sequencecontaining one of the 18 splice sites already targeted on CATmRNA The single-stranded conformation predicted byRNA folding algorithms for these substrates effectively abol-ishes the hypothesized DGunfold-target component of DGbindThe reaction conditions were the same as with the CATmRNA except that the incubation time was 30 timesshorter and the ribozyme concentration was 15 times lowerThe reaction products were then analyzed by polyacryl-amide gel electrophoresis and autoradiography (Fig 4A)The trans-splicing product fractions ranged from 016 6007 for splice site 369 to 29 6 7 for splice site 240(Supplemental Table S1) confirming that all ribozymeswere indeed active in the absence of substrate secondarystructure Because the short substrates required considerablyshorter reaction times and lower ribozyme concentrations toyield product fractions comparable to those obtained withCAT mRNA our results reinforce the notion that substratefolding can significantly reduce trans-splicing efficiency

For almost all ribozymes the expected trans-splicingproduct of 85 nt (8 nt of cleaved substrate plus 77 nt of 39-exon) was accompanied by two other products whoselengths were z80 and 90 nt (Fig 4A) The additionalproducts most likely resulted from one or more of thepossible side reactions in which group I intron ribozymesare known to engage (Kay and Inoue 1987 Johnson et al2005 Dotson et al 2008) These side reactions depend onthe same process modeled in this study ie the bindingof ribozymersquos IGS to substrate target site Neverthelessbecause the identities of the side products were not de-termined experimentally and because our focus was on thecorrect trans-splicing product we took the fraction of the85-nt product alone as a measure of trans-splicing efficiencyfor splice sites on the short substrates The ensuing qualita-tive correlation of computed DGbind with experimentallymeasured efficiency (see below) was not strongly affectedwhen the sum of all three product fractions was used asa measure of trans-splicing efficiency (data not shown)

To determine whether the two remaining energetic con-tributions P1 duplex stability modeled by DGhybrid andbase-pairing interactions between IGS and 39-exon modeledby DGrelease-IGS were able to explain the observed variation intrans-splicing efficiency for splice sites on the short substrates

we plotted the trans-splicing efficiency measured for eachsplice site as a function of DGbind computed for that site (Fig4B) This plot reveals a correlation between calculations andexperiments that is less tight than the correlation for the full-length CAT mRNA suggesting that factors other than P1duplex stability and IGS39-exon interactions might influencetrans-splicing efficiency One reason why these factors be-come important for unstructured substrates (Fig 4B) but notfor structured substrates (Fig 3B) may be the faster trans-splicing kinetics with short substrates Several processes mayplay a stronger role at faster reaction kinetics such as theinitial folding of the ribozyme (Pan et al 1997 Shi et al2009) and later steps in the trans-splicing process (Karbsteinet al 2002 Bell et al 2004) which are not captured by thethermodynamic model used in this study

Computed energetic contributions to trans-splicingefficiency

The good correlation observed between computed values ofDGbind and experimental trans-splicing efficiencies for CAT

FIGURE 4 Trans-splicing efficiency with short 13-nt substrates (A)Representative autoradiograms of products from trans-splicing re-actions Each 13-mer substrate contains one of the 18 splice sites thatwere targeted on CAT mRNA Each group of lanes shows samplestaken after 1 8 16 and 31 min of reaction and is labeled with theposition of the targeted splice site The positions of the unreactedsubstrate (s 13 nt) reaction intermediate (i 8 nt) and trans-splicingproduct (p 85 nt) are indicated The size of the product wasconfirmed by comparison with an 85-nt RNA marker in separateexperiments (data not shown) (B) Plot of experimental trans-splicingefficiency as a function of computed DGbind The thick gray linerepresents a least-mean-squares exponential fit to the data pointswith a coefficient of determination R2 = 027 Only splice site 240(39 kcalmol 29 reacted) results in a strong deviation from thistrend Alternatively a linear fit (data not shown) to the same datapoints yields a correlation coefficient R = 051 with a probabilityp = 0030 that the values are not correlated Error bars are standarddeviations from three independent experiments

Meluzzi et al

594 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

mRNA supports the notion that the ribozyme-substratebinding process can be approximately modeled by assum-ing three idealized molecular events unfolding of targetsite release of the IGS and IGS-target site hybridizationHere we examine the relative energetic contributions ofthese hypothetical molecular events to the observed effi-ciency of a given splice site

The thermodynamic model (Fig 1A) suggests thata very negative hybridization energy DGhybrid wouldfavor trans-splicing whereas very positive energies of sub-strate unfolding DGunfold-target and of IGS releaseDGrelease-IGS would disfavor trans-splicing Figure 5Aand Supplemental Table S2 show the three energy compo-nents of DGbind for each of the experimentally tested splicesites on CAT mRNA The average of DGunfold-target overall splice sites is 18 6 08 kcalmol the average ofDGrelease-IGS is 18 6 09 kcalmol and the average ofDGhybrid is 72 6 14 kcalmol Thus both the strongestenergetic contribution (72 kcalmol) and the largestvariation among splice sites (614 kcalmol) come fromthe hybridization of IGS with the target sites Theseresults indicate that the free energy of IGS-target sitehybridization is more important on average than the freeenergies of target unfolding and IGS release in modulatingDGbind

To assess the importance of the energetic componentsDGunfold-target DGrelease-IGS and DGhybrid toward the corre-lation observed between experimental trans-splicing effi-ciency and DGbind (Fig 3B) we recomputed DGbind byomitting each component in turn and we calculated thecoefficient of determination associated with a least-mean-squares exponential fit to the experimental trans-splicingefficiency as a function of computed DGbind For splice siteson CAT mRNA omitting any of the energetic componentsresulted in a poorer correlation and a consequent loss ofa reliable threshold bounding values of DGbind associatedwith efficient splice sites (Fig 5BndashD) In particular omissionof DGhybrid resulted in a nonnegative DGbind for all splicesites (Fig 5D) whereas omission of either DGunfold-target orDGrelease-IGS resulted in the appearance of up to six false-positives ie splice sites with DGbind lt 4 kcalmol butwithout any experimentally detected trans-splicing product(Fig 5BC) A poor correlation also ensued when omittingpairs of DG components from DGbind (Supplemental Fig S2Supplemental Table S2) Comparing the coefficients ofdetermination (Figs 3B 5BndashD) indicates that the impor-tance of individual energetic contributions toward theobserved correlation between trans-splicing efficiency andDGbind follows the order DGhybrid gt DGunfold-target gtDGrelease-IGS but these energetic contributions are all

necessary to achieve a good correlationWhen the same analysis was performed

for splice sites on the short substrateswe found that omitting DGunfold-target

made no significant difference becausethis component was always lt04 kcalmolas expected (Supplemental Fig S1AB)Omitting either of the other compo-nents resulted in a worse correlation(Supplemental Fig S1CD) The valuesof DGrelease-IGS for the 13-mers wereidentical to the values for correspond-ing splice sites on CAT mRNA whichwas expected because DGrelease-IGS doesnot depend on the substrate Valuesof DGhybrid however differed by up toz1 kcalmol between correspondingsplices sites on 13-mers and CATmRNAThese small energetic differences arosefrom the different nucleotides flankingthe target sites in the two contexts Suchnucleotides vary with splice site on CATmRNA (Supplemental Table S1) but donot vary with splice site on the shortsubstrates (see Materials and Methods)

Trans-tagging assay on CAT mRNA

To determine whether the results of thetrans-tagging assay also generate a good

FIGURE 5 Energetic contributions to the computed binding free energy on CAT mRNA fromthe three molecular events described in Figure 1A (A) The computed energetic contributionsfrom target site unfolding (DGunfold-target gray) ribozyme IGS release (DGrelease-IGS white) andIGS-target site hybridization (DGhybrid black) are shown for each tested splice site Three of the 18tested splice sites were omitted (131 240 369) because their computed energetic values were notamong the strongest 10000 interactions reported by INTARNA Error bars are standard deviationsof the energies calculated with three to six different window sizes as in Figure 3 (BndashD) Plots oftrans-splicing efficiency as a function of computed DGbind values when the contribution of (B)DGunfold-target (C) DGrelease-IGS and (D) DGhybrid is omitted in turn The thick gray lines in BndashDrepresent exponential fits as in Figures 3B and 4B with coefficients of determination R2 = 038057 and 0076 for B C and D respectively For additional details see Figure 3

Computational prediction of splice sites

wwwrnajournalorg 595

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

correlation with the experimentally determined trans-splic-ing efficiencies we performed this assay using CAT mRNAas the substrate The output of the trans-tagging assay is thenumber of times a given splice site is identified in a set ofproduct sequences obtained from in vitro trans-splicingreactions with randomized IGSs (Jones et al 1996) Weobtained a total of 66 product sequences which wereconsistent with trans-splicing at 25 different uridines amongthe 186 uridines that follow the start codon in the CATmRNA sequence (Table 1) The splice sites detected mostfrequently were at uridines 97 and 33 with 14 and 12occurrences respectively On the other hand 15 of the 25detected splice sites were found only once The splice sitesdetected by the trans-tagging assay are represented bydiamonds in Figure 2A

When the experimentally determined trans-splicing ef-ficiencies were plotted as a function of the splice site countsin the trans-tagging assay no correlation was evident (Fig6) More strikingly the assay did not find splice sites 258and 405 even though these splice sites were more efficientthan splice site 97 which was detected 14 times by thetrans-splicing assay Similarly the assay found each of thesplice sites 448 and 197 only once in 66 product sequenceseven though these splice sites were two of the five mostefficient ones among the 18 tested splice sites Thereforethe number of times that a given splice site is found by thetrans-tagging assay does not necessarily reflect the trans-splicing efficiency measured for that splice site

Possible biases of the trans-tagging assay

Why did the trans-tagging results only partially reflect themeasured splice site efficiency One possible explanation isthat the PCR step in the assay disfavors long RT-PCRproducts a well-known phenomenon in quantitative PCR(Bustin 2000) In our trans-tagging experiments this phe-nomenon may have caused the pattern seen in Figure 2AHere the forward PCR primer was designed to identifysplice sites between positions 14 and 643 on CAT mRNAbut only four out of the 25 identified splice sites were found

at positions beyond uridine 197 (Table 1) In contrast thefive most efficient splice sites (also with DGbind below thethreshold of4 kcalmol) were more evenly distributed overthe CAT mRNA (positions 97 197 258 405 and 448) Wereasoned that if the PCR step indeed caused more splicesites to be identified near the forward PCR primer thanelsewhere then moving this primer farther downstreamfrom the mRNA 59 terminus should reveal a differentpattern of detected splice sites now crowded near the newposition of the forward primer Therefore we repeated thetrans-tagging experiment on CAT mRNA this time usinga forward primer designed to identify only splice sites atpositions 207 to 643 The resulting nine product sequencesyielded eight new splice sites those at uridines 248 271273 321 350 378 384 and 405 (data not shown in Fig2A) None of these splice sites had been found among theprevious set of 66 trans-tagging product sequences The newsplice sites were again crowded near the forward primerThese results suggest that the PCR step may cause thetrans-tagging assay to miss efficient splice sites located 200nt or more downstream from the forward PCR primer

A second possible explanation for a bias in the trans-tagging assay lies in the different rates at which ribozymeswith different IGSs are transcribed by T7 RNA polymerasein vitro The assay employs a pool of trans-splicing ribo-zymes whose IGS differs between ribozymes Because the IGSrepresents the 59 terminus of the transcript and because thetranscription efficiency of T7 RNA polymerase is stronglydependent on the sequence of the first 6 nt in the transcript(Milligan et al 1987 Milligan and Uhlenbeck 1989) wehypothesized that the difference in transcription efficiencyachieved for different ribozymes may bias the results of thetrans-tagging assay

To test this second hypothesis we measured the tran-scription efficiency for the 18 studied ribozymes Specifi-cally the ribozymes were individually transcribed in thepresence of a-[32P]-GTP the transcription products were

TABLE 1 Results of the trans-tagging assay

Counta Position of the splice siteb

14 9712 338 544 993 322 18 83 87 166 1771 14 65 73 84 110 131 153 175

179 180 197 356 448 499 573

aNumber of times that a splice site was identified among 66 productsequencesbPosition of a uridine relative to the adenine of the start codon inCAT mRNA

FIGURE 6 Comparison between trans-splicing efficiency and trans-tagging results The measured trans-splicing efficiency for each splicesite is plotted against the number of times that the splice site wasfound in 66 product sequences obtained with the trans-tagging assay(filled diamonds) Open diamonds denote splice sites that were notfound by the trans-tagging assay Note that eight of the 18 sequencesare clustered at the origin of the plot

Meluzzi et al

596 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 2: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

their low efficiency in cells Here trans-splicing efficiencydenotes the fraction of target mRNA that is converted toproduct by the ribozymes Trans-splicing efficiency is typi-cally 10 or less in cells (Fiskaa and Birgisdottir 2010)Although efficiencies of up to 50 have been reported (Byunet al 2003) the necessary intracellular ribozyme concentra-tions appear too high to be acceptable in a clinical settingTherefore an outstanding task for the development of trans-splicing ribozymes is to increase their efficiency allowinga sufficient fraction of mRNA substrate to be trans-spliced atlow ribozyme concentrations

The trans-splicing efficiency varies strongly with thelocation of the splice site within the mRNA substrate(Campbell and Cech 1995 Lan et al 2000) In particularsecondary structures within the substrate can render poten-tial target sites inaccessible to the ribozyme The trans-splicing efficiency may also vary between splice sites owingto different stabilities of the P1 duplex formed betweenribozyme and substrate and to different base-pairing in-teractions between the ribozyme IGS and the ribozyme 39-exon Thus the identification of efficient splice sites iethose sites that mediate a high trans-splicing efficiencyis a nontrivial problem in the design of trans-splicingribozymes

An elegant experimental method known as the trans-tagging assay has been developed to identify efficient splicesites (Jones et al 1996) In this assay the mRNA substrate isincubated in vitro with a pool of trans-splicing ribozymeswhose IGS is randomized This pool of ribozymes is thusable to recognize any potential splice site on the substrateRibozymes targeting efficient splice sites are able to trans-splice at those sites The resulting products are detectedby reverse transcription subcloning and sequencing Thesequences from several clones are then used to deduce thepositions of efficient splice sites (Jones et al 1996 Lan et al1998 Einvik et al 2004) This trans-tagging assay has beensuccessfully appliedmdashwithout explicitly reported limita-tionsmdashto uncover efficient splice sites on various mRNAsubstrates (Lan et al 1998 Watanabe and Sullenger 2000Rogers et al 2002 Park et al 2003 Ryu and Lee 2003 Jungand Lee 2005 Fiskaa et al 2006 Jung and Lee 2006 Kimet al 2007)

The problem of identifying efficient splice sites for trans-splicing ribozymes is in concept similar to that of findingaccessible or optimal binding sites on mRNA substratestargeted by antisense DNA oligonucleotides (Chan et al2006) siRNAs (Shao et al 2007a Walton et al 2010) andother non-protein-coding RNAs (Pichon and Felden 2008)In these cases several computational methods have beenproposed and found to be effective at predicting the rel-evant target sites (Ding and Lawrence 2001 Wang andDrlica 2004 Far et al 2005 Shao et al 2006 Tafer et al2008 Backofen and Hess 2010 Thomas et al 2010) Onerecurring theme is the use of RNA secondary structureprediction algorithms to calculate the free energy changes

DGbind associated with binding an oligonucleotide to eachpossible target site on the mRNA substrate Target siteswith strongly negative values of DGbind are then predictedto be the most accessible sites on the substrate (Mathewset al 1999 Walton et al 1999 Busch et al 2008 Lu andMathews 2008 Tafer et al 2008)

Here we have adapted this approach of computingDGbind to the problem of finding efficient splice sites fortrans-splicing ribozymes The mRNA of chloramphenicolacetyl transferase (CAT) was used as a model substratefor both computations and experiments In particular wecomputed DGbind for all 186 splice sites on this substrateand we experimentally determined the trans-splicing effi-ciency in vitro for 18 of those splice sites Furthermore tomap the accessible splice sites on CAT mRNA we carriedout the trans-tagging assay which differs substantially fromdirect measurements of trans-splicing efficiency Surpris-ingly the experimental trans-splicing efficiencies at the 18tested splice sites were found to correlate better with thecomputed DGbind values than with the results of the trans-tagging assay These results suggest that the proposedcalculation of DGbind could be used to predict efficient splicesites for trans-splicing ribozymes more quickly cheaplyand accurately than with the presently used experimentalmethods

RESULTS

Calculation of DGbind for trans-splicing ribozymes

We defined DGbind as the free energy change associatedwith the binding of the ribozymersquos IGS to a given splice siteon the mRNA substrate Our hypothesis was that a verynegative DGbind corresponds to a high trans-splicing effi-ciency To calculate DGbind the binding process was mod-eled as consisting of three idealized molecular events thatinvolve only local changes in RNA secondary structure theunfolding of the target site on the substrate the release ofthe IGS on the ribozyme and the hybridization betweentarget site and IGS to form the P1 duplex (Fig 1A) Thecorresponding component free energy changes were denotedby DGunfold-target DGrelease-IGS and DGhybrid respectively Bylsquolsquoreleasersquorsquo of the IGS we mean the breaking up of any basepairs that may form between the IGS and the 39-exon Thisrelease is necessary to make the IGS available for binding tothe target site We thus computed DGbind by summing theabove components ie DGbind = DGunfold-target + DGrelease-IGS

+ DGhybrid These components can in turn be computedusing several available RNA folding algorithms whichpredict the possible secondary structures for a given RNAsequence as well as the free energy changes of those struc-tures relative to the unfolded state (Condon and Jabbari2009)

To compute the components of DGbind for trans-splicingribozymes that interact with a long mRNA substrate we

Computational prediction of splice sites

wwwrnajournalorg 591

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

adapted the ribozyme and mRNA sequences before sub-mission to RNA folding algorithms The 3D model of theTetrahymena ribozyme (Lehnert et al 1996) suggests thatbefore pairing to an mRNA target site the IGS is stericallyallowed to base-pair with the 39-exon but does not base-pair with the body of the folded intron (Fig 1B) Thus tocalculate DGrelease-IGS we used a ribozyme sequence inwhich the intron portion downstream from the IGS wasreplaced by a 5-nt linker region (Fig 1C) This shortenedsequence allowed the IGS to interact with the 39-exonwhile permitting rapid computation of DGrelease-IGS byRNA folding algorithms To choose an appropriate linkersequence we noted that in the Tetrahymena group Iintron the IGS is followed by three unpaired adenosinesand the 39-exon is preceded by one unpaired guanosine(Cech et al 1994) Although these nucleotides seem naturalcandidates for inclusion in the linker the 3D structure ofthe ribozyme shows that these nucleotides are conforma-tionally prohibited from pairing with the substrate andfrom engaging in p-stacking interactions with the IGS or

the 39-exon (Fig 1B) Therefore we chose the linker sequenceto consist of five unspecified nucleotides which are effectivelyignored by RNA folding algorithms in the calculation ofDGrelease-IGS and DGhybrid Varying the length of the linkerfrom four to seven unspecified nucleotides did not signifi-cantly change the computed DG values (data not shown)

To predict DGunfold-target from the local secondarystructure around each splice site on a long mRNA sub-strate we directed the RNA folding algorithms to analyzesubsets or windows of the entire substrate sequence Thusonly the sequence within each window containing a givensplice site was used to calculate DGunfold-target for that siteFor the 678-nt sequence of our model mRNA (see below)the resulting values of DGunfold-target were found to vary bygt1 kcalmol over different window sizes (data not shown)This variation may be due to the somewhat arbitraryexclusion of base pairs involving nucleotides outside ofa given window To minimize the influence of windowsize on the predicted DGbind we computed DGunfold-target

using window sizes of 100 200 300 400 500 and 600 ntand then we averaged the resulting values of DGbind

Splice sites chosen on a model mRNA substrateCAT mRNA

As a model mRNA substrate for our computational andexperimental investigations of trans-splicing efficiency wechose the mRNA of chloramphenicol acetyl transferaseThis mRNA sequence is 678 nt long and contains 186uridines downstream from the AUG translation start codonEach of these uridines represents a potential splice site fortrans-splicing ribozymes (Doudna et al 1989) We thuscomputed DGbind for each of these splice sites Figure 2A

FIGURE 1 Interactions within and between the mRNA substrate andthe ribozyme and their treatment during computation (A) Schematicof the substrate binding step in trans-splicing The mRNA substrate isillustrated as a hairpin structure and the body of the ribozyme isillustrated as a gray oval Unfolding of the target site on the mRNA(first step) and release of the internal guide sequence (IGS) (secondstep) allow the mRNA target site and the IGS to hybridize (third step)(B) The 3D structure of the Tetrahymena ribozyme (Lehnert et al1996) indicates that the AAA linker (black) is not stacked on the P1duplex Additionally the 39-exon (gray) is close enough to the IGS(black) for interactions if the mRNA (gray) is absent (C) To facilitatecomputational treatment of the binding process the body of theribozyme was replaced by a linker sequence joining the IGS to the 39-exon This linker does not include the natural AAA linker but consistsof five unspecified nucleotides (N) which are ignored in thecalculation of base-pairing and p-stacking interactions

FIGURE 2 Computed values of DGbind for splice sites on CATmRNA (A) Plot of DGbind against splice site position relative to theadenosine (position 1) of the AUG transcription start codon Onlysplice sites downstream from this codon were mapped by the trans-tagging assay in this study Circles with or without a diamond insideindicate splice sites chosen for experimental determination of trans-splicing efficiency Diamonds indicate splice sites found among 66product sequences obtained by trans-tagging assay (B) Distributionof the 186 splice sites (gray bars) of CAT mRNA over the computedvalues of DGbind Black bars represent the 18 splice sites that werechosen for experimental testing The topmost bar represents 61 splicesites and is shown truncated

Meluzzi et al

592 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

plots the resulting DGbind values as a function of splice siteposition relative to the adenine of the AUG start codonThese values are distributed over the range from 0 to 65kcalmol with most values closer to 0 kcalmol (Fig 2B) Inparticular only seven out of the 186 available splice sites hadDGbind lt 40 kcalmol suggesting that this value of DGbind

could be used as a threshold to predict efficient splice sitesTo determine experimentally whether the computed

values of DGbind can be used to identify efficient splice siteswe chose a set of 18 different target splice sites on CATmRNA To reveal possible correlations between trans-splic-ing efficiency and DGbind as well as between trans-splicingefficiency and position on mRNA substrate these 18 siteswere chosen to cover the full range of DGbind values (Fig 2B)and were distributed over the length of the CAT mRNA (Fig2A) Two splice sites with DGbind slightly below 40 kcalmol were omitted in order to cover the wide range of thepredicted values for DGbind with the available resources

Experimental trans-splicing efficiencieson CAT mRNA

To measure experimentally the trans-splicing efficiency ofthe 18 chosen splice sites we generated 18 trans-splicingribozymes that contained different IGSs but were otherwiseidentical Each IGS was designed to target a particular oneof the chosen splice sites (Supplemental Table S1) Each ofthe 18 ribozymes was incubated with 59-radiolabeled CATmRNA and the reaction products were analyzed by poly-acrylamide gel electrophoresis and autoradiography (Fig3A) As a measure of trans-splicing efficiency we quantifiedthe fraction of radiolabeled substrates converted to trans-splicing products after 4 h of incubation Bands consistentwith the expected trans-splicing products were observed fornine of the 18 tested splice sites namely for splice sitesat uridines 83 87 97 197 222 258 350 405 and 448Product fractions for these splice sites ranged from 04 ofthe substrate at uridine 222 to 143 at uridine 258(Supplemental Table S1) Nine of the 18 splice sites didnot yield quantifiable products

The experimental trans-splicing efficiencies were foundto agree with the computed DGbind values (Fig 3B) Inparticular the efficiency of splice sites with DGbind gt 4kcalmol was relatively low reaching at most 22 On theother hand the efficiency of splice sites with DGbind lt 4kcalmol increased steadily with increasingly more negativevalues of DGbind reaching 143 at DGbind = 61 kcalmolOnly the five most efficient splice sites had a computedDGbind lt 40 kcalmol again suggesting that efficient splicesites could be predicted by comparing the computed DGbind

to the empirical threshold of DGbind 40 kcalmolTo test the identity of the observed trans-splicing

products we reverse-transcribed cloned and sequencedthe products of all nine reactions that had resulted inquantifiable bands on polyacrylamide gels (sequence data

not shown) The expected product sequences were obtainedfor eight of the nine tested splice sites Product sequences forsplice site 350 which did not yield the expected productindicated that the corresponding ribozyme had splicedprimarily at uridine 307 Therefore the actual trans-splicingefficiency at splice site 350 may be lower than its measuredtrans-splicing efficiency (22) possibly improving furtherthe correlation between computed DGbind and trans-splicingefficiency seen in Figure 3B In summary the sequencing

FIGURE 3 Trans-splicing efficiency measured on 18 splice sites onCAT mRNA (A) Autoradiogram of reaction products from trans-splicing reactions with radiolabeled substrate Lanes are labeled withthe targeted splice sites or with () where ribozyme was absent RNAmarker sizes in nucleotides are indicated The unreacted substrate hada length of 678 nucleotides whereas the length of reaction interme-diates (i) and trans-splicing products () varied with the splice sitesNote that the ribozymes targeting splice sites 405 and 448 also splicedat each otherrsquos splice site (B) Correlation between computed bindingfree energies and experimental trans-splicing efficiencies Each di-amond represents a specific splice site Splice sites 131 240 and 369have DGbind $ 0 kcalmol and are represented by a white diamond atthe origin The data points near or over the DGbind-axis around thevalue of 3 kcalmol correspond from left to right to splice sites 222518 273 346 325 378 and 551 The thick gray line represents a least-mean-squares exponential fit y = A exp(B x) to the data points witha coefficient of determination R2 = 087 Alternatively a linear fit(data not shown) to the same data points yields a correlationcoefficient R = 075 with a probability p = 000033 that the valuesare not correlated Horizontal error bars are standard deviations overthree to six different window sizes (100ndash600 nt) Vertical error barsare standard deviations from three independent experiments

Computational prediction of splice sites

wwwrnajournalorg 593

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

results confirmed all but one of the nine products observedin the trans-splicing reactions

Experimental trans-splicing efficiency with unstructuredRNA substrates

To confirm that all of the 18 ribozymes employed in ex-periments with CAT mRNA were catalytically active wemeasured their trans-splicing efficiency on short substratesthat were designed to prevent formation of secondary struc-ture Hence this experiment also allowed us to examine theeffects of substrate secondary structure on trans-splicingefficiency Each short substrate consisted of a 13-nt sequencecontaining one of the 18 splice sites already targeted on CATmRNA The single-stranded conformation predicted byRNA folding algorithms for these substrates effectively abol-ishes the hypothesized DGunfold-target component of DGbindThe reaction conditions were the same as with the CATmRNA except that the incubation time was 30 timesshorter and the ribozyme concentration was 15 times lowerThe reaction products were then analyzed by polyacryl-amide gel electrophoresis and autoradiography (Fig 4A)The trans-splicing product fractions ranged from 016 6007 for splice site 369 to 29 6 7 for splice site 240(Supplemental Table S1) confirming that all ribozymeswere indeed active in the absence of substrate secondarystructure Because the short substrates required considerablyshorter reaction times and lower ribozyme concentrations toyield product fractions comparable to those obtained withCAT mRNA our results reinforce the notion that substratefolding can significantly reduce trans-splicing efficiency

For almost all ribozymes the expected trans-splicingproduct of 85 nt (8 nt of cleaved substrate plus 77 nt of 39-exon) was accompanied by two other products whoselengths were z80 and 90 nt (Fig 4A) The additionalproducts most likely resulted from one or more of thepossible side reactions in which group I intron ribozymesare known to engage (Kay and Inoue 1987 Johnson et al2005 Dotson et al 2008) These side reactions depend onthe same process modeled in this study ie the bindingof ribozymersquos IGS to substrate target site Neverthelessbecause the identities of the side products were not de-termined experimentally and because our focus was on thecorrect trans-splicing product we took the fraction of the85-nt product alone as a measure of trans-splicing efficiencyfor splice sites on the short substrates The ensuing qualita-tive correlation of computed DGbind with experimentallymeasured efficiency (see below) was not strongly affectedwhen the sum of all three product fractions was used asa measure of trans-splicing efficiency (data not shown)

To determine whether the two remaining energetic con-tributions P1 duplex stability modeled by DGhybrid andbase-pairing interactions between IGS and 39-exon modeledby DGrelease-IGS were able to explain the observed variation intrans-splicing efficiency for splice sites on the short substrates

we plotted the trans-splicing efficiency measured for eachsplice site as a function of DGbind computed for that site (Fig4B) This plot reveals a correlation between calculations andexperiments that is less tight than the correlation for the full-length CAT mRNA suggesting that factors other than P1duplex stability and IGS39-exon interactions might influencetrans-splicing efficiency One reason why these factors be-come important for unstructured substrates (Fig 4B) but notfor structured substrates (Fig 3B) may be the faster trans-splicing kinetics with short substrates Several processes mayplay a stronger role at faster reaction kinetics such as theinitial folding of the ribozyme (Pan et al 1997 Shi et al2009) and later steps in the trans-splicing process (Karbsteinet al 2002 Bell et al 2004) which are not captured by thethermodynamic model used in this study

Computed energetic contributions to trans-splicingefficiency

The good correlation observed between computed values ofDGbind and experimental trans-splicing efficiencies for CAT

FIGURE 4 Trans-splicing efficiency with short 13-nt substrates (A)Representative autoradiograms of products from trans-splicing re-actions Each 13-mer substrate contains one of the 18 splice sites thatwere targeted on CAT mRNA Each group of lanes shows samplestaken after 1 8 16 and 31 min of reaction and is labeled with theposition of the targeted splice site The positions of the unreactedsubstrate (s 13 nt) reaction intermediate (i 8 nt) and trans-splicingproduct (p 85 nt) are indicated The size of the product wasconfirmed by comparison with an 85-nt RNA marker in separateexperiments (data not shown) (B) Plot of experimental trans-splicingefficiency as a function of computed DGbind The thick gray linerepresents a least-mean-squares exponential fit to the data pointswith a coefficient of determination R2 = 027 Only splice site 240(39 kcalmol 29 reacted) results in a strong deviation from thistrend Alternatively a linear fit (data not shown) to the same datapoints yields a correlation coefficient R = 051 with a probabilityp = 0030 that the values are not correlated Error bars are standarddeviations from three independent experiments

Meluzzi et al

594 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

mRNA supports the notion that the ribozyme-substratebinding process can be approximately modeled by assum-ing three idealized molecular events unfolding of targetsite release of the IGS and IGS-target site hybridizationHere we examine the relative energetic contributions ofthese hypothetical molecular events to the observed effi-ciency of a given splice site

The thermodynamic model (Fig 1A) suggests thata very negative hybridization energy DGhybrid wouldfavor trans-splicing whereas very positive energies of sub-strate unfolding DGunfold-target and of IGS releaseDGrelease-IGS would disfavor trans-splicing Figure 5Aand Supplemental Table S2 show the three energy compo-nents of DGbind for each of the experimentally tested splicesites on CAT mRNA The average of DGunfold-target overall splice sites is 18 6 08 kcalmol the average ofDGrelease-IGS is 18 6 09 kcalmol and the average ofDGhybrid is 72 6 14 kcalmol Thus both the strongestenergetic contribution (72 kcalmol) and the largestvariation among splice sites (614 kcalmol) come fromthe hybridization of IGS with the target sites Theseresults indicate that the free energy of IGS-target sitehybridization is more important on average than the freeenergies of target unfolding and IGS release in modulatingDGbind

To assess the importance of the energetic componentsDGunfold-target DGrelease-IGS and DGhybrid toward the corre-lation observed between experimental trans-splicing effi-ciency and DGbind (Fig 3B) we recomputed DGbind byomitting each component in turn and we calculated thecoefficient of determination associated with a least-mean-squares exponential fit to the experimental trans-splicingefficiency as a function of computed DGbind For splice siteson CAT mRNA omitting any of the energetic componentsresulted in a poorer correlation and a consequent loss ofa reliable threshold bounding values of DGbind associatedwith efficient splice sites (Fig 5BndashD) In particular omissionof DGhybrid resulted in a nonnegative DGbind for all splicesites (Fig 5D) whereas omission of either DGunfold-target orDGrelease-IGS resulted in the appearance of up to six false-positives ie splice sites with DGbind lt 4 kcalmol butwithout any experimentally detected trans-splicing product(Fig 5BC) A poor correlation also ensued when omittingpairs of DG components from DGbind (Supplemental Fig S2Supplemental Table S2) Comparing the coefficients ofdetermination (Figs 3B 5BndashD) indicates that the impor-tance of individual energetic contributions toward theobserved correlation between trans-splicing efficiency andDGbind follows the order DGhybrid gt DGunfold-target gtDGrelease-IGS but these energetic contributions are all

necessary to achieve a good correlationWhen the same analysis was performed

for splice sites on the short substrateswe found that omitting DGunfold-target

made no significant difference becausethis component was always lt04 kcalmolas expected (Supplemental Fig S1AB)Omitting either of the other compo-nents resulted in a worse correlation(Supplemental Fig S1CD) The valuesof DGrelease-IGS for the 13-mers wereidentical to the values for correspond-ing splice sites on CAT mRNA whichwas expected because DGrelease-IGS doesnot depend on the substrate Valuesof DGhybrid however differed by up toz1 kcalmol between correspondingsplices sites on 13-mers and CATmRNAThese small energetic differences arosefrom the different nucleotides flankingthe target sites in the two contexts Suchnucleotides vary with splice site on CATmRNA (Supplemental Table S1) but donot vary with splice site on the shortsubstrates (see Materials and Methods)

Trans-tagging assay on CAT mRNA

To determine whether the results of thetrans-tagging assay also generate a good

FIGURE 5 Energetic contributions to the computed binding free energy on CAT mRNA fromthe three molecular events described in Figure 1A (A) The computed energetic contributionsfrom target site unfolding (DGunfold-target gray) ribozyme IGS release (DGrelease-IGS white) andIGS-target site hybridization (DGhybrid black) are shown for each tested splice site Three of the 18tested splice sites were omitted (131 240 369) because their computed energetic values were notamong the strongest 10000 interactions reported by INTARNA Error bars are standard deviationsof the energies calculated with three to six different window sizes as in Figure 3 (BndashD) Plots oftrans-splicing efficiency as a function of computed DGbind values when the contribution of (B)DGunfold-target (C) DGrelease-IGS and (D) DGhybrid is omitted in turn The thick gray lines in BndashDrepresent exponential fits as in Figures 3B and 4B with coefficients of determination R2 = 038057 and 0076 for B C and D respectively For additional details see Figure 3

Computational prediction of splice sites

wwwrnajournalorg 595

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

correlation with the experimentally determined trans-splic-ing efficiencies we performed this assay using CAT mRNAas the substrate The output of the trans-tagging assay is thenumber of times a given splice site is identified in a set ofproduct sequences obtained from in vitro trans-splicingreactions with randomized IGSs (Jones et al 1996) Weobtained a total of 66 product sequences which wereconsistent with trans-splicing at 25 different uridines amongthe 186 uridines that follow the start codon in the CATmRNA sequence (Table 1) The splice sites detected mostfrequently were at uridines 97 and 33 with 14 and 12occurrences respectively On the other hand 15 of the 25detected splice sites were found only once The splice sitesdetected by the trans-tagging assay are represented bydiamonds in Figure 2A

When the experimentally determined trans-splicing ef-ficiencies were plotted as a function of the splice site countsin the trans-tagging assay no correlation was evident (Fig6) More strikingly the assay did not find splice sites 258and 405 even though these splice sites were more efficientthan splice site 97 which was detected 14 times by thetrans-splicing assay Similarly the assay found each of thesplice sites 448 and 197 only once in 66 product sequenceseven though these splice sites were two of the five mostefficient ones among the 18 tested splice sites Thereforethe number of times that a given splice site is found by thetrans-tagging assay does not necessarily reflect the trans-splicing efficiency measured for that splice site

Possible biases of the trans-tagging assay

Why did the trans-tagging results only partially reflect themeasured splice site efficiency One possible explanation isthat the PCR step in the assay disfavors long RT-PCRproducts a well-known phenomenon in quantitative PCR(Bustin 2000) In our trans-tagging experiments this phe-nomenon may have caused the pattern seen in Figure 2AHere the forward PCR primer was designed to identifysplice sites between positions 14 and 643 on CAT mRNAbut only four out of the 25 identified splice sites were found

at positions beyond uridine 197 (Table 1) In contrast thefive most efficient splice sites (also with DGbind below thethreshold of4 kcalmol) were more evenly distributed overthe CAT mRNA (positions 97 197 258 405 and 448) Wereasoned that if the PCR step indeed caused more splicesites to be identified near the forward PCR primer thanelsewhere then moving this primer farther downstreamfrom the mRNA 59 terminus should reveal a differentpattern of detected splice sites now crowded near the newposition of the forward primer Therefore we repeated thetrans-tagging experiment on CAT mRNA this time usinga forward primer designed to identify only splice sites atpositions 207 to 643 The resulting nine product sequencesyielded eight new splice sites those at uridines 248 271273 321 350 378 384 and 405 (data not shown in Fig2A) None of these splice sites had been found among theprevious set of 66 trans-tagging product sequences The newsplice sites were again crowded near the forward primerThese results suggest that the PCR step may cause thetrans-tagging assay to miss efficient splice sites located 200nt or more downstream from the forward PCR primer

A second possible explanation for a bias in the trans-tagging assay lies in the different rates at which ribozymeswith different IGSs are transcribed by T7 RNA polymerasein vitro The assay employs a pool of trans-splicing ribo-zymes whose IGS differs between ribozymes Because the IGSrepresents the 59 terminus of the transcript and because thetranscription efficiency of T7 RNA polymerase is stronglydependent on the sequence of the first 6 nt in the transcript(Milligan et al 1987 Milligan and Uhlenbeck 1989) wehypothesized that the difference in transcription efficiencyachieved for different ribozymes may bias the results of thetrans-tagging assay

To test this second hypothesis we measured the tran-scription efficiency for the 18 studied ribozymes Specifi-cally the ribozymes were individually transcribed in thepresence of a-[32P]-GTP the transcription products were

TABLE 1 Results of the trans-tagging assay

Counta Position of the splice siteb

14 9712 338 544 993 322 18 83 87 166 1771 14 65 73 84 110 131 153 175

179 180 197 356 448 499 573

aNumber of times that a splice site was identified among 66 productsequencesbPosition of a uridine relative to the adenine of the start codon inCAT mRNA

FIGURE 6 Comparison between trans-splicing efficiency and trans-tagging results The measured trans-splicing efficiency for each splicesite is plotted against the number of times that the splice site wasfound in 66 product sequences obtained with the trans-tagging assay(filled diamonds) Open diamonds denote splice sites that were notfound by the trans-tagging assay Note that eight of the 18 sequencesare clustered at the origin of the plot

Meluzzi et al

596 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 3: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

adapted the ribozyme and mRNA sequences before sub-mission to RNA folding algorithms The 3D model of theTetrahymena ribozyme (Lehnert et al 1996) suggests thatbefore pairing to an mRNA target site the IGS is stericallyallowed to base-pair with the 39-exon but does not base-pair with the body of the folded intron (Fig 1B) Thus tocalculate DGrelease-IGS we used a ribozyme sequence inwhich the intron portion downstream from the IGS wasreplaced by a 5-nt linker region (Fig 1C) This shortenedsequence allowed the IGS to interact with the 39-exonwhile permitting rapid computation of DGrelease-IGS byRNA folding algorithms To choose an appropriate linkersequence we noted that in the Tetrahymena group Iintron the IGS is followed by three unpaired adenosinesand the 39-exon is preceded by one unpaired guanosine(Cech et al 1994) Although these nucleotides seem naturalcandidates for inclusion in the linker the 3D structure ofthe ribozyme shows that these nucleotides are conforma-tionally prohibited from pairing with the substrate andfrom engaging in p-stacking interactions with the IGS or

the 39-exon (Fig 1B) Therefore we chose the linker sequenceto consist of five unspecified nucleotides which are effectivelyignored by RNA folding algorithms in the calculation ofDGrelease-IGS and DGhybrid Varying the length of the linkerfrom four to seven unspecified nucleotides did not signifi-cantly change the computed DG values (data not shown)

To predict DGunfold-target from the local secondarystructure around each splice site on a long mRNA sub-strate we directed the RNA folding algorithms to analyzesubsets or windows of the entire substrate sequence Thusonly the sequence within each window containing a givensplice site was used to calculate DGunfold-target for that siteFor the 678-nt sequence of our model mRNA (see below)the resulting values of DGunfold-target were found to vary bygt1 kcalmol over different window sizes (data not shown)This variation may be due to the somewhat arbitraryexclusion of base pairs involving nucleotides outside ofa given window To minimize the influence of windowsize on the predicted DGbind we computed DGunfold-target

using window sizes of 100 200 300 400 500 and 600 ntand then we averaged the resulting values of DGbind

Splice sites chosen on a model mRNA substrateCAT mRNA

As a model mRNA substrate for our computational andexperimental investigations of trans-splicing efficiency wechose the mRNA of chloramphenicol acetyl transferaseThis mRNA sequence is 678 nt long and contains 186uridines downstream from the AUG translation start codonEach of these uridines represents a potential splice site fortrans-splicing ribozymes (Doudna et al 1989) We thuscomputed DGbind for each of these splice sites Figure 2A

FIGURE 1 Interactions within and between the mRNA substrate andthe ribozyme and their treatment during computation (A) Schematicof the substrate binding step in trans-splicing The mRNA substrate isillustrated as a hairpin structure and the body of the ribozyme isillustrated as a gray oval Unfolding of the target site on the mRNA(first step) and release of the internal guide sequence (IGS) (secondstep) allow the mRNA target site and the IGS to hybridize (third step)(B) The 3D structure of the Tetrahymena ribozyme (Lehnert et al1996) indicates that the AAA linker (black) is not stacked on the P1duplex Additionally the 39-exon (gray) is close enough to the IGS(black) for interactions if the mRNA (gray) is absent (C) To facilitatecomputational treatment of the binding process the body of theribozyme was replaced by a linker sequence joining the IGS to the 39-exon This linker does not include the natural AAA linker but consistsof five unspecified nucleotides (N) which are ignored in thecalculation of base-pairing and p-stacking interactions

FIGURE 2 Computed values of DGbind for splice sites on CATmRNA (A) Plot of DGbind against splice site position relative to theadenosine (position 1) of the AUG transcription start codon Onlysplice sites downstream from this codon were mapped by the trans-tagging assay in this study Circles with or without a diamond insideindicate splice sites chosen for experimental determination of trans-splicing efficiency Diamonds indicate splice sites found among 66product sequences obtained by trans-tagging assay (B) Distributionof the 186 splice sites (gray bars) of CAT mRNA over the computedvalues of DGbind Black bars represent the 18 splice sites that werechosen for experimental testing The topmost bar represents 61 splicesites and is shown truncated

Meluzzi et al

592 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

plots the resulting DGbind values as a function of splice siteposition relative to the adenine of the AUG start codonThese values are distributed over the range from 0 to 65kcalmol with most values closer to 0 kcalmol (Fig 2B) Inparticular only seven out of the 186 available splice sites hadDGbind lt 40 kcalmol suggesting that this value of DGbind

could be used as a threshold to predict efficient splice sitesTo determine experimentally whether the computed

values of DGbind can be used to identify efficient splice siteswe chose a set of 18 different target splice sites on CATmRNA To reveal possible correlations between trans-splic-ing efficiency and DGbind as well as between trans-splicingefficiency and position on mRNA substrate these 18 siteswere chosen to cover the full range of DGbind values (Fig 2B)and were distributed over the length of the CAT mRNA (Fig2A) Two splice sites with DGbind slightly below 40 kcalmol were omitted in order to cover the wide range of thepredicted values for DGbind with the available resources

Experimental trans-splicing efficiencieson CAT mRNA

To measure experimentally the trans-splicing efficiency ofthe 18 chosen splice sites we generated 18 trans-splicingribozymes that contained different IGSs but were otherwiseidentical Each IGS was designed to target a particular oneof the chosen splice sites (Supplemental Table S1) Each ofthe 18 ribozymes was incubated with 59-radiolabeled CATmRNA and the reaction products were analyzed by poly-acrylamide gel electrophoresis and autoradiography (Fig3A) As a measure of trans-splicing efficiency we quantifiedthe fraction of radiolabeled substrates converted to trans-splicing products after 4 h of incubation Bands consistentwith the expected trans-splicing products were observed fornine of the 18 tested splice sites namely for splice sitesat uridines 83 87 97 197 222 258 350 405 and 448Product fractions for these splice sites ranged from 04 ofthe substrate at uridine 222 to 143 at uridine 258(Supplemental Table S1) Nine of the 18 splice sites didnot yield quantifiable products

The experimental trans-splicing efficiencies were foundto agree with the computed DGbind values (Fig 3B) Inparticular the efficiency of splice sites with DGbind gt 4kcalmol was relatively low reaching at most 22 On theother hand the efficiency of splice sites with DGbind lt 4kcalmol increased steadily with increasingly more negativevalues of DGbind reaching 143 at DGbind = 61 kcalmolOnly the five most efficient splice sites had a computedDGbind lt 40 kcalmol again suggesting that efficient splicesites could be predicted by comparing the computed DGbind

to the empirical threshold of DGbind 40 kcalmolTo test the identity of the observed trans-splicing

products we reverse-transcribed cloned and sequencedthe products of all nine reactions that had resulted inquantifiable bands on polyacrylamide gels (sequence data

not shown) The expected product sequences were obtainedfor eight of the nine tested splice sites Product sequences forsplice site 350 which did not yield the expected productindicated that the corresponding ribozyme had splicedprimarily at uridine 307 Therefore the actual trans-splicingefficiency at splice site 350 may be lower than its measuredtrans-splicing efficiency (22) possibly improving furtherthe correlation between computed DGbind and trans-splicingefficiency seen in Figure 3B In summary the sequencing

FIGURE 3 Trans-splicing efficiency measured on 18 splice sites onCAT mRNA (A) Autoradiogram of reaction products from trans-splicing reactions with radiolabeled substrate Lanes are labeled withthe targeted splice sites or with () where ribozyme was absent RNAmarker sizes in nucleotides are indicated The unreacted substrate hada length of 678 nucleotides whereas the length of reaction interme-diates (i) and trans-splicing products () varied with the splice sitesNote that the ribozymes targeting splice sites 405 and 448 also splicedat each otherrsquos splice site (B) Correlation between computed bindingfree energies and experimental trans-splicing efficiencies Each di-amond represents a specific splice site Splice sites 131 240 and 369have DGbind $ 0 kcalmol and are represented by a white diamond atthe origin The data points near or over the DGbind-axis around thevalue of 3 kcalmol correspond from left to right to splice sites 222518 273 346 325 378 and 551 The thick gray line represents a least-mean-squares exponential fit y = A exp(B x) to the data points witha coefficient of determination R2 = 087 Alternatively a linear fit(data not shown) to the same data points yields a correlationcoefficient R = 075 with a probability p = 000033 that the valuesare not correlated Horizontal error bars are standard deviations overthree to six different window sizes (100ndash600 nt) Vertical error barsare standard deviations from three independent experiments

Computational prediction of splice sites

wwwrnajournalorg 593

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

results confirmed all but one of the nine products observedin the trans-splicing reactions

Experimental trans-splicing efficiency with unstructuredRNA substrates

To confirm that all of the 18 ribozymes employed in ex-periments with CAT mRNA were catalytically active wemeasured their trans-splicing efficiency on short substratesthat were designed to prevent formation of secondary struc-ture Hence this experiment also allowed us to examine theeffects of substrate secondary structure on trans-splicingefficiency Each short substrate consisted of a 13-nt sequencecontaining one of the 18 splice sites already targeted on CATmRNA The single-stranded conformation predicted byRNA folding algorithms for these substrates effectively abol-ishes the hypothesized DGunfold-target component of DGbindThe reaction conditions were the same as with the CATmRNA except that the incubation time was 30 timesshorter and the ribozyme concentration was 15 times lowerThe reaction products were then analyzed by polyacryl-amide gel electrophoresis and autoradiography (Fig 4A)The trans-splicing product fractions ranged from 016 6007 for splice site 369 to 29 6 7 for splice site 240(Supplemental Table S1) confirming that all ribozymeswere indeed active in the absence of substrate secondarystructure Because the short substrates required considerablyshorter reaction times and lower ribozyme concentrations toyield product fractions comparable to those obtained withCAT mRNA our results reinforce the notion that substratefolding can significantly reduce trans-splicing efficiency

For almost all ribozymes the expected trans-splicingproduct of 85 nt (8 nt of cleaved substrate plus 77 nt of 39-exon) was accompanied by two other products whoselengths were z80 and 90 nt (Fig 4A) The additionalproducts most likely resulted from one or more of thepossible side reactions in which group I intron ribozymesare known to engage (Kay and Inoue 1987 Johnson et al2005 Dotson et al 2008) These side reactions depend onthe same process modeled in this study ie the bindingof ribozymersquos IGS to substrate target site Neverthelessbecause the identities of the side products were not de-termined experimentally and because our focus was on thecorrect trans-splicing product we took the fraction of the85-nt product alone as a measure of trans-splicing efficiencyfor splice sites on the short substrates The ensuing qualita-tive correlation of computed DGbind with experimentallymeasured efficiency (see below) was not strongly affectedwhen the sum of all three product fractions was used asa measure of trans-splicing efficiency (data not shown)

To determine whether the two remaining energetic con-tributions P1 duplex stability modeled by DGhybrid andbase-pairing interactions between IGS and 39-exon modeledby DGrelease-IGS were able to explain the observed variation intrans-splicing efficiency for splice sites on the short substrates

we plotted the trans-splicing efficiency measured for eachsplice site as a function of DGbind computed for that site (Fig4B) This plot reveals a correlation between calculations andexperiments that is less tight than the correlation for the full-length CAT mRNA suggesting that factors other than P1duplex stability and IGS39-exon interactions might influencetrans-splicing efficiency One reason why these factors be-come important for unstructured substrates (Fig 4B) but notfor structured substrates (Fig 3B) may be the faster trans-splicing kinetics with short substrates Several processes mayplay a stronger role at faster reaction kinetics such as theinitial folding of the ribozyme (Pan et al 1997 Shi et al2009) and later steps in the trans-splicing process (Karbsteinet al 2002 Bell et al 2004) which are not captured by thethermodynamic model used in this study

Computed energetic contributions to trans-splicingefficiency

The good correlation observed between computed values ofDGbind and experimental trans-splicing efficiencies for CAT

FIGURE 4 Trans-splicing efficiency with short 13-nt substrates (A)Representative autoradiograms of products from trans-splicing re-actions Each 13-mer substrate contains one of the 18 splice sites thatwere targeted on CAT mRNA Each group of lanes shows samplestaken after 1 8 16 and 31 min of reaction and is labeled with theposition of the targeted splice site The positions of the unreactedsubstrate (s 13 nt) reaction intermediate (i 8 nt) and trans-splicingproduct (p 85 nt) are indicated The size of the product wasconfirmed by comparison with an 85-nt RNA marker in separateexperiments (data not shown) (B) Plot of experimental trans-splicingefficiency as a function of computed DGbind The thick gray linerepresents a least-mean-squares exponential fit to the data pointswith a coefficient of determination R2 = 027 Only splice site 240(39 kcalmol 29 reacted) results in a strong deviation from thistrend Alternatively a linear fit (data not shown) to the same datapoints yields a correlation coefficient R = 051 with a probabilityp = 0030 that the values are not correlated Error bars are standarddeviations from three independent experiments

Meluzzi et al

594 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

mRNA supports the notion that the ribozyme-substratebinding process can be approximately modeled by assum-ing three idealized molecular events unfolding of targetsite release of the IGS and IGS-target site hybridizationHere we examine the relative energetic contributions ofthese hypothetical molecular events to the observed effi-ciency of a given splice site

The thermodynamic model (Fig 1A) suggests thata very negative hybridization energy DGhybrid wouldfavor trans-splicing whereas very positive energies of sub-strate unfolding DGunfold-target and of IGS releaseDGrelease-IGS would disfavor trans-splicing Figure 5Aand Supplemental Table S2 show the three energy compo-nents of DGbind for each of the experimentally tested splicesites on CAT mRNA The average of DGunfold-target overall splice sites is 18 6 08 kcalmol the average ofDGrelease-IGS is 18 6 09 kcalmol and the average ofDGhybrid is 72 6 14 kcalmol Thus both the strongestenergetic contribution (72 kcalmol) and the largestvariation among splice sites (614 kcalmol) come fromthe hybridization of IGS with the target sites Theseresults indicate that the free energy of IGS-target sitehybridization is more important on average than the freeenergies of target unfolding and IGS release in modulatingDGbind

To assess the importance of the energetic componentsDGunfold-target DGrelease-IGS and DGhybrid toward the corre-lation observed between experimental trans-splicing effi-ciency and DGbind (Fig 3B) we recomputed DGbind byomitting each component in turn and we calculated thecoefficient of determination associated with a least-mean-squares exponential fit to the experimental trans-splicingefficiency as a function of computed DGbind For splice siteson CAT mRNA omitting any of the energetic componentsresulted in a poorer correlation and a consequent loss ofa reliable threshold bounding values of DGbind associatedwith efficient splice sites (Fig 5BndashD) In particular omissionof DGhybrid resulted in a nonnegative DGbind for all splicesites (Fig 5D) whereas omission of either DGunfold-target orDGrelease-IGS resulted in the appearance of up to six false-positives ie splice sites with DGbind lt 4 kcalmol butwithout any experimentally detected trans-splicing product(Fig 5BC) A poor correlation also ensued when omittingpairs of DG components from DGbind (Supplemental Fig S2Supplemental Table S2) Comparing the coefficients ofdetermination (Figs 3B 5BndashD) indicates that the impor-tance of individual energetic contributions toward theobserved correlation between trans-splicing efficiency andDGbind follows the order DGhybrid gt DGunfold-target gtDGrelease-IGS but these energetic contributions are all

necessary to achieve a good correlationWhen the same analysis was performed

for splice sites on the short substrateswe found that omitting DGunfold-target

made no significant difference becausethis component was always lt04 kcalmolas expected (Supplemental Fig S1AB)Omitting either of the other compo-nents resulted in a worse correlation(Supplemental Fig S1CD) The valuesof DGrelease-IGS for the 13-mers wereidentical to the values for correspond-ing splice sites on CAT mRNA whichwas expected because DGrelease-IGS doesnot depend on the substrate Valuesof DGhybrid however differed by up toz1 kcalmol between correspondingsplices sites on 13-mers and CATmRNAThese small energetic differences arosefrom the different nucleotides flankingthe target sites in the two contexts Suchnucleotides vary with splice site on CATmRNA (Supplemental Table S1) but donot vary with splice site on the shortsubstrates (see Materials and Methods)

Trans-tagging assay on CAT mRNA

To determine whether the results of thetrans-tagging assay also generate a good

FIGURE 5 Energetic contributions to the computed binding free energy on CAT mRNA fromthe three molecular events described in Figure 1A (A) The computed energetic contributionsfrom target site unfolding (DGunfold-target gray) ribozyme IGS release (DGrelease-IGS white) andIGS-target site hybridization (DGhybrid black) are shown for each tested splice site Three of the 18tested splice sites were omitted (131 240 369) because their computed energetic values were notamong the strongest 10000 interactions reported by INTARNA Error bars are standard deviationsof the energies calculated with three to six different window sizes as in Figure 3 (BndashD) Plots oftrans-splicing efficiency as a function of computed DGbind values when the contribution of (B)DGunfold-target (C) DGrelease-IGS and (D) DGhybrid is omitted in turn The thick gray lines in BndashDrepresent exponential fits as in Figures 3B and 4B with coefficients of determination R2 = 038057 and 0076 for B C and D respectively For additional details see Figure 3

Computational prediction of splice sites

wwwrnajournalorg 595

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

correlation with the experimentally determined trans-splic-ing efficiencies we performed this assay using CAT mRNAas the substrate The output of the trans-tagging assay is thenumber of times a given splice site is identified in a set ofproduct sequences obtained from in vitro trans-splicingreactions with randomized IGSs (Jones et al 1996) Weobtained a total of 66 product sequences which wereconsistent with trans-splicing at 25 different uridines amongthe 186 uridines that follow the start codon in the CATmRNA sequence (Table 1) The splice sites detected mostfrequently were at uridines 97 and 33 with 14 and 12occurrences respectively On the other hand 15 of the 25detected splice sites were found only once The splice sitesdetected by the trans-tagging assay are represented bydiamonds in Figure 2A

When the experimentally determined trans-splicing ef-ficiencies were plotted as a function of the splice site countsin the trans-tagging assay no correlation was evident (Fig6) More strikingly the assay did not find splice sites 258and 405 even though these splice sites were more efficientthan splice site 97 which was detected 14 times by thetrans-splicing assay Similarly the assay found each of thesplice sites 448 and 197 only once in 66 product sequenceseven though these splice sites were two of the five mostefficient ones among the 18 tested splice sites Thereforethe number of times that a given splice site is found by thetrans-tagging assay does not necessarily reflect the trans-splicing efficiency measured for that splice site

Possible biases of the trans-tagging assay

Why did the trans-tagging results only partially reflect themeasured splice site efficiency One possible explanation isthat the PCR step in the assay disfavors long RT-PCRproducts a well-known phenomenon in quantitative PCR(Bustin 2000) In our trans-tagging experiments this phe-nomenon may have caused the pattern seen in Figure 2AHere the forward PCR primer was designed to identifysplice sites between positions 14 and 643 on CAT mRNAbut only four out of the 25 identified splice sites were found

at positions beyond uridine 197 (Table 1) In contrast thefive most efficient splice sites (also with DGbind below thethreshold of4 kcalmol) were more evenly distributed overthe CAT mRNA (positions 97 197 258 405 and 448) Wereasoned that if the PCR step indeed caused more splicesites to be identified near the forward PCR primer thanelsewhere then moving this primer farther downstreamfrom the mRNA 59 terminus should reveal a differentpattern of detected splice sites now crowded near the newposition of the forward primer Therefore we repeated thetrans-tagging experiment on CAT mRNA this time usinga forward primer designed to identify only splice sites atpositions 207 to 643 The resulting nine product sequencesyielded eight new splice sites those at uridines 248 271273 321 350 378 384 and 405 (data not shown in Fig2A) None of these splice sites had been found among theprevious set of 66 trans-tagging product sequences The newsplice sites were again crowded near the forward primerThese results suggest that the PCR step may cause thetrans-tagging assay to miss efficient splice sites located 200nt or more downstream from the forward PCR primer

A second possible explanation for a bias in the trans-tagging assay lies in the different rates at which ribozymeswith different IGSs are transcribed by T7 RNA polymerasein vitro The assay employs a pool of trans-splicing ribo-zymes whose IGS differs between ribozymes Because the IGSrepresents the 59 terminus of the transcript and because thetranscription efficiency of T7 RNA polymerase is stronglydependent on the sequence of the first 6 nt in the transcript(Milligan et al 1987 Milligan and Uhlenbeck 1989) wehypothesized that the difference in transcription efficiencyachieved for different ribozymes may bias the results of thetrans-tagging assay

To test this second hypothesis we measured the tran-scription efficiency for the 18 studied ribozymes Specifi-cally the ribozymes were individually transcribed in thepresence of a-[32P]-GTP the transcription products were

TABLE 1 Results of the trans-tagging assay

Counta Position of the splice siteb

14 9712 338 544 993 322 18 83 87 166 1771 14 65 73 84 110 131 153 175

179 180 197 356 448 499 573

aNumber of times that a splice site was identified among 66 productsequencesbPosition of a uridine relative to the adenine of the start codon inCAT mRNA

FIGURE 6 Comparison between trans-splicing efficiency and trans-tagging results The measured trans-splicing efficiency for each splicesite is plotted against the number of times that the splice site wasfound in 66 product sequences obtained with the trans-tagging assay(filled diamonds) Open diamonds denote splice sites that were notfound by the trans-tagging assay Note that eight of the 18 sequencesare clustered at the origin of the plot

Meluzzi et al

596 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 4: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

plots the resulting DGbind values as a function of splice siteposition relative to the adenine of the AUG start codonThese values are distributed over the range from 0 to 65kcalmol with most values closer to 0 kcalmol (Fig 2B) Inparticular only seven out of the 186 available splice sites hadDGbind lt 40 kcalmol suggesting that this value of DGbind

could be used as a threshold to predict efficient splice sitesTo determine experimentally whether the computed

values of DGbind can be used to identify efficient splice siteswe chose a set of 18 different target splice sites on CATmRNA To reveal possible correlations between trans-splic-ing efficiency and DGbind as well as between trans-splicingefficiency and position on mRNA substrate these 18 siteswere chosen to cover the full range of DGbind values (Fig 2B)and were distributed over the length of the CAT mRNA (Fig2A) Two splice sites with DGbind slightly below 40 kcalmol were omitted in order to cover the wide range of thepredicted values for DGbind with the available resources

Experimental trans-splicing efficiencieson CAT mRNA

To measure experimentally the trans-splicing efficiency ofthe 18 chosen splice sites we generated 18 trans-splicingribozymes that contained different IGSs but were otherwiseidentical Each IGS was designed to target a particular oneof the chosen splice sites (Supplemental Table S1) Each ofthe 18 ribozymes was incubated with 59-radiolabeled CATmRNA and the reaction products were analyzed by poly-acrylamide gel electrophoresis and autoradiography (Fig3A) As a measure of trans-splicing efficiency we quantifiedthe fraction of radiolabeled substrates converted to trans-splicing products after 4 h of incubation Bands consistentwith the expected trans-splicing products were observed fornine of the 18 tested splice sites namely for splice sitesat uridines 83 87 97 197 222 258 350 405 and 448Product fractions for these splice sites ranged from 04 ofthe substrate at uridine 222 to 143 at uridine 258(Supplemental Table S1) Nine of the 18 splice sites didnot yield quantifiable products

The experimental trans-splicing efficiencies were foundto agree with the computed DGbind values (Fig 3B) Inparticular the efficiency of splice sites with DGbind gt 4kcalmol was relatively low reaching at most 22 On theother hand the efficiency of splice sites with DGbind lt 4kcalmol increased steadily with increasingly more negativevalues of DGbind reaching 143 at DGbind = 61 kcalmolOnly the five most efficient splice sites had a computedDGbind lt 40 kcalmol again suggesting that efficient splicesites could be predicted by comparing the computed DGbind

to the empirical threshold of DGbind 40 kcalmolTo test the identity of the observed trans-splicing

products we reverse-transcribed cloned and sequencedthe products of all nine reactions that had resulted inquantifiable bands on polyacrylamide gels (sequence data

not shown) The expected product sequences were obtainedfor eight of the nine tested splice sites Product sequences forsplice site 350 which did not yield the expected productindicated that the corresponding ribozyme had splicedprimarily at uridine 307 Therefore the actual trans-splicingefficiency at splice site 350 may be lower than its measuredtrans-splicing efficiency (22) possibly improving furtherthe correlation between computed DGbind and trans-splicingefficiency seen in Figure 3B In summary the sequencing

FIGURE 3 Trans-splicing efficiency measured on 18 splice sites onCAT mRNA (A) Autoradiogram of reaction products from trans-splicing reactions with radiolabeled substrate Lanes are labeled withthe targeted splice sites or with () where ribozyme was absent RNAmarker sizes in nucleotides are indicated The unreacted substrate hada length of 678 nucleotides whereas the length of reaction interme-diates (i) and trans-splicing products () varied with the splice sitesNote that the ribozymes targeting splice sites 405 and 448 also splicedat each otherrsquos splice site (B) Correlation between computed bindingfree energies and experimental trans-splicing efficiencies Each di-amond represents a specific splice site Splice sites 131 240 and 369have DGbind $ 0 kcalmol and are represented by a white diamond atthe origin The data points near or over the DGbind-axis around thevalue of 3 kcalmol correspond from left to right to splice sites 222518 273 346 325 378 and 551 The thick gray line represents a least-mean-squares exponential fit y = A exp(B x) to the data points witha coefficient of determination R2 = 087 Alternatively a linear fit(data not shown) to the same data points yields a correlationcoefficient R = 075 with a probability p = 000033 that the valuesare not correlated Horizontal error bars are standard deviations overthree to six different window sizes (100ndash600 nt) Vertical error barsare standard deviations from three independent experiments

Computational prediction of splice sites

wwwrnajournalorg 593

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

results confirmed all but one of the nine products observedin the trans-splicing reactions

Experimental trans-splicing efficiency with unstructuredRNA substrates

To confirm that all of the 18 ribozymes employed in ex-periments with CAT mRNA were catalytically active wemeasured their trans-splicing efficiency on short substratesthat were designed to prevent formation of secondary struc-ture Hence this experiment also allowed us to examine theeffects of substrate secondary structure on trans-splicingefficiency Each short substrate consisted of a 13-nt sequencecontaining one of the 18 splice sites already targeted on CATmRNA The single-stranded conformation predicted byRNA folding algorithms for these substrates effectively abol-ishes the hypothesized DGunfold-target component of DGbindThe reaction conditions were the same as with the CATmRNA except that the incubation time was 30 timesshorter and the ribozyme concentration was 15 times lowerThe reaction products were then analyzed by polyacryl-amide gel electrophoresis and autoradiography (Fig 4A)The trans-splicing product fractions ranged from 016 6007 for splice site 369 to 29 6 7 for splice site 240(Supplemental Table S1) confirming that all ribozymeswere indeed active in the absence of substrate secondarystructure Because the short substrates required considerablyshorter reaction times and lower ribozyme concentrations toyield product fractions comparable to those obtained withCAT mRNA our results reinforce the notion that substratefolding can significantly reduce trans-splicing efficiency

For almost all ribozymes the expected trans-splicingproduct of 85 nt (8 nt of cleaved substrate plus 77 nt of 39-exon) was accompanied by two other products whoselengths were z80 and 90 nt (Fig 4A) The additionalproducts most likely resulted from one or more of thepossible side reactions in which group I intron ribozymesare known to engage (Kay and Inoue 1987 Johnson et al2005 Dotson et al 2008) These side reactions depend onthe same process modeled in this study ie the bindingof ribozymersquos IGS to substrate target site Neverthelessbecause the identities of the side products were not de-termined experimentally and because our focus was on thecorrect trans-splicing product we took the fraction of the85-nt product alone as a measure of trans-splicing efficiencyfor splice sites on the short substrates The ensuing qualita-tive correlation of computed DGbind with experimentallymeasured efficiency (see below) was not strongly affectedwhen the sum of all three product fractions was used asa measure of trans-splicing efficiency (data not shown)

To determine whether the two remaining energetic con-tributions P1 duplex stability modeled by DGhybrid andbase-pairing interactions between IGS and 39-exon modeledby DGrelease-IGS were able to explain the observed variation intrans-splicing efficiency for splice sites on the short substrates

we plotted the trans-splicing efficiency measured for eachsplice site as a function of DGbind computed for that site (Fig4B) This plot reveals a correlation between calculations andexperiments that is less tight than the correlation for the full-length CAT mRNA suggesting that factors other than P1duplex stability and IGS39-exon interactions might influencetrans-splicing efficiency One reason why these factors be-come important for unstructured substrates (Fig 4B) but notfor structured substrates (Fig 3B) may be the faster trans-splicing kinetics with short substrates Several processes mayplay a stronger role at faster reaction kinetics such as theinitial folding of the ribozyme (Pan et al 1997 Shi et al2009) and later steps in the trans-splicing process (Karbsteinet al 2002 Bell et al 2004) which are not captured by thethermodynamic model used in this study

Computed energetic contributions to trans-splicingefficiency

The good correlation observed between computed values ofDGbind and experimental trans-splicing efficiencies for CAT

FIGURE 4 Trans-splicing efficiency with short 13-nt substrates (A)Representative autoradiograms of products from trans-splicing re-actions Each 13-mer substrate contains one of the 18 splice sites thatwere targeted on CAT mRNA Each group of lanes shows samplestaken after 1 8 16 and 31 min of reaction and is labeled with theposition of the targeted splice site The positions of the unreactedsubstrate (s 13 nt) reaction intermediate (i 8 nt) and trans-splicingproduct (p 85 nt) are indicated The size of the product wasconfirmed by comparison with an 85-nt RNA marker in separateexperiments (data not shown) (B) Plot of experimental trans-splicingefficiency as a function of computed DGbind The thick gray linerepresents a least-mean-squares exponential fit to the data pointswith a coefficient of determination R2 = 027 Only splice site 240(39 kcalmol 29 reacted) results in a strong deviation from thistrend Alternatively a linear fit (data not shown) to the same datapoints yields a correlation coefficient R = 051 with a probabilityp = 0030 that the values are not correlated Error bars are standarddeviations from three independent experiments

Meluzzi et al

594 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

mRNA supports the notion that the ribozyme-substratebinding process can be approximately modeled by assum-ing three idealized molecular events unfolding of targetsite release of the IGS and IGS-target site hybridizationHere we examine the relative energetic contributions ofthese hypothetical molecular events to the observed effi-ciency of a given splice site

The thermodynamic model (Fig 1A) suggests thata very negative hybridization energy DGhybrid wouldfavor trans-splicing whereas very positive energies of sub-strate unfolding DGunfold-target and of IGS releaseDGrelease-IGS would disfavor trans-splicing Figure 5Aand Supplemental Table S2 show the three energy compo-nents of DGbind for each of the experimentally tested splicesites on CAT mRNA The average of DGunfold-target overall splice sites is 18 6 08 kcalmol the average ofDGrelease-IGS is 18 6 09 kcalmol and the average ofDGhybrid is 72 6 14 kcalmol Thus both the strongestenergetic contribution (72 kcalmol) and the largestvariation among splice sites (614 kcalmol) come fromthe hybridization of IGS with the target sites Theseresults indicate that the free energy of IGS-target sitehybridization is more important on average than the freeenergies of target unfolding and IGS release in modulatingDGbind

To assess the importance of the energetic componentsDGunfold-target DGrelease-IGS and DGhybrid toward the corre-lation observed between experimental trans-splicing effi-ciency and DGbind (Fig 3B) we recomputed DGbind byomitting each component in turn and we calculated thecoefficient of determination associated with a least-mean-squares exponential fit to the experimental trans-splicingefficiency as a function of computed DGbind For splice siteson CAT mRNA omitting any of the energetic componentsresulted in a poorer correlation and a consequent loss ofa reliable threshold bounding values of DGbind associatedwith efficient splice sites (Fig 5BndashD) In particular omissionof DGhybrid resulted in a nonnegative DGbind for all splicesites (Fig 5D) whereas omission of either DGunfold-target orDGrelease-IGS resulted in the appearance of up to six false-positives ie splice sites with DGbind lt 4 kcalmol butwithout any experimentally detected trans-splicing product(Fig 5BC) A poor correlation also ensued when omittingpairs of DG components from DGbind (Supplemental Fig S2Supplemental Table S2) Comparing the coefficients ofdetermination (Figs 3B 5BndashD) indicates that the impor-tance of individual energetic contributions toward theobserved correlation between trans-splicing efficiency andDGbind follows the order DGhybrid gt DGunfold-target gtDGrelease-IGS but these energetic contributions are all

necessary to achieve a good correlationWhen the same analysis was performed

for splice sites on the short substrateswe found that omitting DGunfold-target

made no significant difference becausethis component was always lt04 kcalmolas expected (Supplemental Fig S1AB)Omitting either of the other compo-nents resulted in a worse correlation(Supplemental Fig S1CD) The valuesof DGrelease-IGS for the 13-mers wereidentical to the values for correspond-ing splice sites on CAT mRNA whichwas expected because DGrelease-IGS doesnot depend on the substrate Valuesof DGhybrid however differed by up toz1 kcalmol between correspondingsplices sites on 13-mers and CATmRNAThese small energetic differences arosefrom the different nucleotides flankingthe target sites in the two contexts Suchnucleotides vary with splice site on CATmRNA (Supplemental Table S1) but donot vary with splice site on the shortsubstrates (see Materials and Methods)

Trans-tagging assay on CAT mRNA

To determine whether the results of thetrans-tagging assay also generate a good

FIGURE 5 Energetic contributions to the computed binding free energy on CAT mRNA fromthe three molecular events described in Figure 1A (A) The computed energetic contributionsfrom target site unfolding (DGunfold-target gray) ribozyme IGS release (DGrelease-IGS white) andIGS-target site hybridization (DGhybrid black) are shown for each tested splice site Three of the 18tested splice sites were omitted (131 240 369) because their computed energetic values were notamong the strongest 10000 interactions reported by INTARNA Error bars are standard deviationsof the energies calculated with three to six different window sizes as in Figure 3 (BndashD) Plots oftrans-splicing efficiency as a function of computed DGbind values when the contribution of (B)DGunfold-target (C) DGrelease-IGS and (D) DGhybrid is omitted in turn The thick gray lines in BndashDrepresent exponential fits as in Figures 3B and 4B with coefficients of determination R2 = 038057 and 0076 for B C and D respectively For additional details see Figure 3

Computational prediction of splice sites

wwwrnajournalorg 595

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

correlation with the experimentally determined trans-splic-ing efficiencies we performed this assay using CAT mRNAas the substrate The output of the trans-tagging assay is thenumber of times a given splice site is identified in a set ofproduct sequences obtained from in vitro trans-splicingreactions with randomized IGSs (Jones et al 1996) Weobtained a total of 66 product sequences which wereconsistent with trans-splicing at 25 different uridines amongthe 186 uridines that follow the start codon in the CATmRNA sequence (Table 1) The splice sites detected mostfrequently were at uridines 97 and 33 with 14 and 12occurrences respectively On the other hand 15 of the 25detected splice sites were found only once The splice sitesdetected by the trans-tagging assay are represented bydiamonds in Figure 2A

When the experimentally determined trans-splicing ef-ficiencies were plotted as a function of the splice site countsin the trans-tagging assay no correlation was evident (Fig6) More strikingly the assay did not find splice sites 258and 405 even though these splice sites were more efficientthan splice site 97 which was detected 14 times by thetrans-splicing assay Similarly the assay found each of thesplice sites 448 and 197 only once in 66 product sequenceseven though these splice sites were two of the five mostefficient ones among the 18 tested splice sites Thereforethe number of times that a given splice site is found by thetrans-tagging assay does not necessarily reflect the trans-splicing efficiency measured for that splice site

Possible biases of the trans-tagging assay

Why did the trans-tagging results only partially reflect themeasured splice site efficiency One possible explanation isthat the PCR step in the assay disfavors long RT-PCRproducts a well-known phenomenon in quantitative PCR(Bustin 2000) In our trans-tagging experiments this phe-nomenon may have caused the pattern seen in Figure 2AHere the forward PCR primer was designed to identifysplice sites between positions 14 and 643 on CAT mRNAbut only four out of the 25 identified splice sites were found

at positions beyond uridine 197 (Table 1) In contrast thefive most efficient splice sites (also with DGbind below thethreshold of4 kcalmol) were more evenly distributed overthe CAT mRNA (positions 97 197 258 405 and 448) Wereasoned that if the PCR step indeed caused more splicesites to be identified near the forward PCR primer thanelsewhere then moving this primer farther downstreamfrom the mRNA 59 terminus should reveal a differentpattern of detected splice sites now crowded near the newposition of the forward primer Therefore we repeated thetrans-tagging experiment on CAT mRNA this time usinga forward primer designed to identify only splice sites atpositions 207 to 643 The resulting nine product sequencesyielded eight new splice sites those at uridines 248 271273 321 350 378 384 and 405 (data not shown in Fig2A) None of these splice sites had been found among theprevious set of 66 trans-tagging product sequences The newsplice sites were again crowded near the forward primerThese results suggest that the PCR step may cause thetrans-tagging assay to miss efficient splice sites located 200nt or more downstream from the forward PCR primer

A second possible explanation for a bias in the trans-tagging assay lies in the different rates at which ribozymeswith different IGSs are transcribed by T7 RNA polymerasein vitro The assay employs a pool of trans-splicing ribo-zymes whose IGS differs between ribozymes Because the IGSrepresents the 59 terminus of the transcript and because thetranscription efficiency of T7 RNA polymerase is stronglydependent on the sequence of the first 6 nt in the transcript(Milligan et al 1987 Milligan and Uhlenbeck 1989) wehypothesized that the difference in transcription efficiencyachieved for different ribozymes may bias the results of thetrans-tagging assay

To test this second hypothesis we measured the tran-scription efficiency for the 18 studied ribozymes Specifi-cally the ribozymes were individually transcribed in thepresence of a-[32P]-GTP the transcription products were

TABLE 1 Results of the trans-tagging assay

Counta Position of the splice siteb

14 9712 338 544 993 322 18 83 87 166 1771 14 65 73 84 110 131 153 175

179 180 197 356 448 499 573

aNumber of times that a splice site was identified among 66 productsequencesbPosition of a uridine relative to the adenine of the start codon inCAT mRNA

FIGURE 6 Comparison between trans-splicing efficiency and trans-tagging results The measured trans-splicing efficiency for each splicesite is plotted against the number of times that the splice site wasfound in 66 product sequences obtained with the trans-tagging assay(filled diamonds) Open diamonds denote splice sites that were notfound by the trans-tagging assay Note that eight of the 18 sequencesare clustered at the origin of the plot

Meluzzi et al

596 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 5: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

results confirmed all but one of the nine products observedin the trans-splicing reactions

Experimental trans-splicing efficiency with unstructuredRNA substrates

To confirm that all of the 18 ribozymes employed in ex-periments with CAT mRNA were catalytically active wemeasured their trans-splicing efficiency on short substratesthat were designed to prevent formation of secondary struc-ture Hence this experiment also allowed us to examine theeffects of substrate secondary structure on trans-splicingefficiency Each short substrate consisted of a 13-nt sequencecontaining one of the 18 splice sites already targeted on CATmRNA The single-stranded conformation predicted byRNA folding algorithms for these substrates effectively abol-ishes the hypothesized DGunfold-target component of DGbindThe reaction conditions were the same as with the CATmRNA except that the incubation time was 30 timesshorter and the ribozyme concentration was 15 times lowerThe reaction products were then analyzed by polyacryl-amide gel electrophoresis and autoradiography (Fig 4A)The trans-splicing product fractions ranged from 016 6007 for splice site 369 to 29 6 7 for splice site 240(Supplemental Table S1) confirming that all ribozymeswere indeed active in the absence of substrate secondarystructure Because the short substrates required considerablyshorter reaction times and lower ribozyme concentrations toyield product fractions comparable to those obtained withCAT mRNA our results reinforce the notion that substratefolding can significantly reduce trans-splicing efficiency

For almost all ribozymes the expected trans-splicingproduct of 85 nt (8 nt of cleaved substrate plus 77 nt of 39-exon) was accompanied by two other products whoselengths were z80 and 90 nt (Fig 4A) The additionalproducts most likely resulted from one or more of thepossible side reactions in which group I intron ribozymesare known to engage (Kay and Inoue 1987 Johnson et al2005 Dotson et al 2008) These side reactions depend onthe same process modeled in this study ie the bindingof ribozymersquos IGS to substrate target site Neverthelessbecause the identities of the side products were not de-termined experimentally and because our focus was on thecorrect trans-splicing product we took the fraction of the85-nt product alone as a measure of trans-splicing efficiencyfor splice sites on the short substrates The ensuing qualita-tive correlation of computed DGbind with experimentallymeasured efficiency (see below) was not strongly affectedwhen the sum of all three product fractions was used asa measure of trans-splicing efficiency (data not shown)

To determine whether the two remaining energetic con-tributions P1 duplex stability modeled by DGhybrid andbase-pairing interactions between IGS and 39-exon modeledby DGrelease-IGS were able to explain the observed variation intrans-splicing efficiency for splice sites on the short substrates

we plotted the trans-splicing efficiency measured for eachsplice site as a function of DGbind computed for that site (Fig4B) This plot reveals a correlation between calculations andexperiments that is less tight than the correlation for the full-length CAT mRNA suggesting that factors other than P1duplex stability and IGS39-exon interactions might influencetrans-splicing efficiency One reason why these factors be-come important for unstructured substrates (Fig 4B) but notfor structured substrates (Fig 3B) may be the faster trans-splicing kinetics with short substrates Several processes mayplay a stronger role at faster reaction kinetics such as theinitial folding of the ribozyme (Pan et al 1997 Shi et al2009) and later steps in the trans-splicing process (Karbsteinet al 2002 Bell et al 2004) which are not captured by thethermodynamic model used in this study

Computed energetic contributions to trans-splicingefficiency

The good correlation observed between computed values ofDGbind and experimental trans-splicing efficiencies for CAT

FIGURE 4 Trans-splicing efficiency with short 13-nt substrates (A)Representative autoradiograms of products from trans-splicing re-actions Each 13-mer substrate contains one of the 18 splice sites thatwere targeted on CAT mRNA Each group of lanes shows samplestaken after 1 8 16 and 31 min of reaction and is labeled with theposition of the targeted splice site The positions of the unreactedsubstrate (s 13 nt) reaction intermediate (i 8 nt) and trans-splicingproduct (p 85 nt) are indicated The size of the product wasconfirmed by comparison with an 85-nt RNA marker in separateexperiments (data not shown) (B) Plot of experimental trans-splicingefficiency as a function of computed DGbind The thick gray linerepresents a least-mean-squares exponential fit to the data pointswith a coefficient of determination R2 = 027 Only splice site 240(39 kcalmol 29 reacted) results in a strong deviation from thistrend Alternatively a linear fit (data not shown) to the same datapoints yields a correlation coefficient R = 051 with a probabilityp = 0030 that the values are not correlated Error bars are standarddeviations from three independent experiments

Meluzzi et al

594 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

mRNA supports the notion that the ribozyme-substratebinding process can be approximately modeled by assum-ing three idealized molecular events unfolding of targetsite release of the IGS and IGS-target site hybridizationHere we examine the relative energetic contributions ofthese hypothetical molecular events to the observed effi-ciency of a given splice site

The thermodynamic model (Fig 1A) suggests thata very negative hybridization energy DGhybrid wouldfavor trans-splicing whereas very positive energies of sub-strate unfolding DGunfold-target and of IGS releaseDGrelease-IGS would disfavor trans-splicing Figure 5Aand Supplemental Table S2 show the three energy compo-nents of DGbind for each of the experimentally tested splicesites on CAT mRNA The average of DGunfold-target overall splice sites is 18 6 08 kcalmol the average ofDGrelease-IGS is 18 6 09 kcalmol and the average ofDGhybrid is 72 6 14 kcalmol Thus both the strongestenergetic contribution (72 kcalmol) and the largestvariation among splice sites (614 kcalmol) come fromthe hybridization of IGS with the target sites Theseresults indicate that the free energy of IGS-target sitehybridization is more important on average than the freeenergies of target unfolding and IGS release in modulatingDGbind

To assess the importance of the energetic componentsDGunfold-target DGrelease-IGS and DGhybrid toward the corre-lation observed between experimental trans-splicing effi-ciency and DGbind (Fig 3B) we recomputed DGbind byomitting each component in turn and we calculated thecoefficient of determination associated with a least-mean-squares exponential fit to the experimental trans-splicingefficiency as a function of computed DGbind For splice siteson CAT mRNA omitting any of the energetic componentsresulted in a poorer correlation and a consequent loss ofa reliable threshold bounding values of DGbind associatedwith efficient splice sites (Fig 5BndashD) In particular omissionof DGhybrid resulted in a nonnegative DGbind for all splicesites (Fig 5D) whereas omission of either DGunfold-target orDGrelease-IGS resulted in the appearance of up to six false-positives ie splice sites with DGbind lt 4 kcalmol butwithout any experimentally detected trans-splicing product(Fig 5BC) A poor correlation also ensued when omittingpairs of DG components from DGbind (Supplemental Fig S2Supplemental Table S2) Comparing the coefficients ofdetermination (Figs 3B 5BndashD) indicates that the impor-tance of individual energetic contributions toward theobserved correlation between trans-splicing efficiency andDGbind follows the order DGhybrid gt DGunfold-target gtDGrelease-IGS but these energetic contributions are all

necessary to achieve a good correlationWhen the same analysis was performed

for splice sites on the short substrateswe found that omitting DGunfold-target

made no significant difference becausethis component was always lt04 kcalmolas expected (Supplemental Fig S1AB)Omitting either of the other compo-nents resulted in a worse correlation(Supplemental Fig S1CD) The valuesof DGrelease-IGS for the 13-mers wereidentical to the values for correspond-ing splice sites on CAT mRNA whichwas expected because DGrelease-IGS doesnot depend on the substrate Valuesof DGhybrid however differed by up toz1 kcalmol between correspondingsplices sites on 13-mers and CATmRNAThese small energetic differences arosefrom the different nucleotides flankingthe target sites in the two contexts Suchnucleotides vary with splice site on CATmRNA (Supplemental Table S1) but donot vary with splice site on the shortsubstrates (see Materials and Methods)

Trans-tagging assay on CAT mRNA

To determine whether the results of thetrans-tagging assay also generate a good

FIGURE 5 Energetic contributions to the computed binding free energy on CAT mRNA fromthe three molecular events described in Figure 1A (A) The computed energetic contributionsfrom target site unfolding (DGunfold-target gray) ribozyme IGS release (DGrelease-IGS white) andIGS-target site hybridization (DGhybrid black) are shown for each tested splice site Three of the 18tested splice sites were omitted (131 240 369) because their computed energetic values were notamong the strongest 10000 interactions reported by INTARNA Error bars are standard deviationsof the energies calculated with three to six different window sizes as in Figure 3 (BndashD) Plots oftrans-splicing efficiency as a function of computed DGbind values when the contribution of (B)DGunfold-target (C) DGrelease-IGS and (D) DGhybrid is omitted in turn The thick gray lines in BndashDrepresent exponential fits as in Figures 3B and 4B with coefficients of determination R2 = 038057 and 0076 for B C and D respectively For additional details see Figure 3

Computational prediction of splice sites

wwwrnajournalorg 595

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

correlation with the experimentally determined trans-splic-ing efficiencies we performed this assay using CAT mRNAas the substrate The output of the trans-tagging assay is thenumber of times a given splice site is identified in a set ofproduct sequences obtained from in vitro trans-splicingreactions with randomized IGSs (Jones et al 1996) Weobtained a total of 66 product sequences which wereconsistent with trans-splicing at 25 different uridines amongthe 186 uridines that follow the start codon in the CATmRNA sequence (Table 1) The splice sites detected mostfrequently were at uridines 97 and 33 with 14 and 12occurrences respectively On the other hand 15 of the 25detected splice sites were found only once The splice sitesdetected by the trans-tagging assay are represented bydiamonds in Figure 2A

When the experimentally determined trans-splicing ef-ficiencies were plotted as a function of the splice site countsin the trans-tagging assay no correlation was evident (Fig6) More strikingly the assay did not find splice sites 258and 405 even though these splice sites were more efficientthan splice site 97 which was detected 14 times by thetrans-splicing assay Similarly the assay found each of thesplice sites 448 and 197 only once in 66 product sequenceseven though these splice sites were two of the five mostefficient ones among the 18 tested splice sites Thereforethe number of times that a given splice site is found by thetrans-tagging assay does not necessarily reflect the trans-splicing efficiency measured for that splice site

Possible biases of the trans-tagging assay

Why did the trans-tagging results only partially reflect themeasured splice site efficiency One possible explanation isthat the PCR step in the assay disfavors long RT-PCRproducts a well-known phenomenon in quantitative PCR(Bustin 2000) In our trans-tagging experiments this phe-nomenon may have caused the pattern seen in Figure 2AHere the forward PCR primer was designed to identifysplice sites between positions 14 and 643 on CAT mRNAbut only four out of the 25 identified splice sites were found

at positions beyond uridine 197 (Table 1) In contrast thefive most efficient splice sites (also with DGbind below thethreshold of4 kcalmol) were more evenly distributed overthe CAT mRNA (positions 97 197 258 405 and 448) Wereasoned that if the PCR step indeed caused more splicesites to be identified near the forward PCR primer thanelsewhere then moving this primer farther downstreamfrom the mRNA 59 terminus should reveal a differentpattern of detected splice sites now crowded near the newposition of the forward primer Therefore we repeated thetrans-tagging experiment on CAT mRNA this time usinga forward primer designed to identify only splice sites atpositions 207 to 643 The resulting nine product sequencesyielded eight new splice sites those at uridines 248 271273 321 350 378 384 and 405 (data not shown in Fig2A) None of these splice sites had been found among theprevious set of 66 trans-tagging product sequences The newsplice sites were again crowded near the forward primerThese results suggest that the PCR step may cause thetrans-tagging assay to miss efficient splice sites located 200nt or more downstream from the forward PCR primer

A second possible explanation for a bias in the trans-tagging assay lies in the different rates at which ribozymeswith different IGSs are transcribed by T7 RNA polymerasein vitro The assay employs a pool of trans-splicing ribo-zymes whose IGS differs between ribozymes Because the IGSrepresents the 59 terminus of the transcript and because thetranscription efficiency of T7 RNA polymerase is stronglydependent on the sequence of the first 6 nt in the transcript(Milligan et al 1987 Milligan and Uhlenbeck 1989) wehypothesized that the difference in transcription efficiencyachieved for different ribozymes may bias the results of thetrans-tagging assay

To test this second hypothesis we measured the tran-scription efficiency for the 18 studied ribozymes Specifi-cally the ribozymes were individually transcribed in thepresence of a-[32P]-GTP the transcription products were

TABLE 1 Results of the trans-tagging assay

Counta Position of the splice siteb

14 9712 338 544 993 322 18 83 87 166 1771 14 65 73 84 110 131 153 175

179 180 197 356 448 499 573

aNumber of times that a splice site was identified among 66 productsequencesbPosition of a uridine relative to the adenine of the start codon inCAT mRNA

FIGURE 6 Comparison between trans-splicing efficiency and trans-tagging results The measured trans-splicing efficiency for each splicesite is plotted against the number of times that the splice site wasfound in 66 product sequences obtained with the trans-tagging assay(filled diamonds) Open diamonds denote splice sites that were notfound by the trans-tagging assay Note that eight of the 18 sequencesare clustered at the origin of the plot

Meluzzi et al

596 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 6: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

mRNA supports the notion that the ribozyme-substratebinding process can be approximately modeled by assum-ing three idealized molecular events unfolding of targetsite release of the IGS and IGS-target site hybridizationHere we examine the relative energetic contributions ofthese hypothetical molecular events to the observed effi-ciency of a given splice site

The thermodynamic model (Fig 1A) suggests thata very negative hybridization energy DGhybrid wouldfavor trans-splicing whereas very positive energies of sub-strate unfolding DGunfold-target and of IGS releaseDGrelease-IGS would disfavor trans-splicing Figure 5Aand Supplemental Table S2 show the three energy compo-nents of DGbind for each of the experimentally tested splicesites on CAT mRNA The average of DGunfold-target overall splice sites is 18 6 08 kcalmol the average ofDGrelease-IGS is 18 6 09 kcalmol and the average ofDGhybrid is 72 6 14 kcalmol Thus both the strongestenergetic contribution (72 kcalmol) and the largestvariation among splice sites (614 kcalmol) come fromthe hybridization of IGS with the target sites Theseresults indicate that the free energy of IGS-target sitehybridization is more important on average than the freeenergies of target unfolding and IGS release in modulatingDGbind

To assess the importance of the energetic componentsDGunfold-target DGrelease-IGS and DGhybrid toward the corre-lation observed between experimental trans-splicing effi-ciency and DGbind (Fig 3B) we recomputed DGbind byomitting each component in turn and we calculated thecoefficient of determination associated with a least-mean-squares exponential fit to the experimental trans-splicingefficiency as a function of computed DGbind For splice siteson CAT mRNA omitting any of the energetic componentsresulted in a poorer correlation and a consequent loss ofa reliable threshold bounding values of DGbind associatedwith efficient splice sites (Fig 5BndashD) In particular omissionof DGhybrid resulted in a nonnegative DGbind for all splicesites (Fig 5D) whereas omission of either DGunfold-target orDGrelease-IGS resulted in the appearance of up to six false-positives ie splice sites with DGbind lt 4 kcalmol butwithout any experimentally detected trans-splicing product(Fig 5BC) A poor correlation also ensued when omittingpairs of DG components from DGbind (Supplemental Fig S2Supplemental Table S2) Comparing the coefficients ofdetermination (Figs 3B 5BndashD) indicates that the impor-tance of individual energetic contributions toward theobserved correlation between trans-splicing efficiency andDGbind follows the order DGhybrid gt DGunfold-target gtDGrelease-IGS but these energetic contributions are all

necessary to achieve a good correlationWhen the same analysis was performed

for splice sites on the short substrateswe found that omitting DGunfold-target

made no significant difference becausethis component was always lt04 kcalmolas expected (Supplemental Fig S1AB)Omitting either of the other compo-nents resulted in a worse correlation(Supplemental Fig S1CD) The valuesof DGrelease-IGS for the 13-mers wereidentical to the values for correspond-ing splice sites on CAT mRNA whichwas expected because DGrelease-IGS doesnot depend on the substrate Valuesof DGhybrid however differed by up toz1 kcalmol between correspondingsplices sites on 13-mers and CATmRNAThese small energetic differences arosefrom the different nucleotides flankingthe target sites in the two contexts Suchnucleotides vary with splice site on CATmRNA (Supplemental Table S1) but donot vary with splice site on the shortsubstrates (see Materials and Methods)

Trans-tagging assay on CAT mRNA

To determine whether the results of thetrans-tagging assay also generate a good

FIGURE 5 Energetic contributions to the computed binding free energy on CAT mRNA fromthe three molecular events described in Figure 1A (A) The computed energetic contributionsfrom target site unfolding (DGunfold-target gray) ribozyme IGS release (DGrelease-IGS white) andIGS-target site hybridization (DGhybrid black) are shown for each tested splice site Three of the 18tested splice sites were omitted (131 240 369) because their computed energetic values were notamong the strongest 10000 interactions reported by INTARNA Error bars are standard deviationsof the energies calculated with three to six different window sizes as in Figure 3 (BndashD) Plots oftrans-splicing efficiency as a function of computed DGbind values when the contribution of (B)DGunfold-target (C) DGrelease-IGS and (D) DGhybrid is omitted in turn The thick gray lines in BndashDrepresent exponential fits as in Figures 3B and 4B with coefficients of determination R2 = 038057 and 0076 for B C and D respectively For additional details see Figure 3

Computational prediction of splice sites

wwwrnajournalorg 595

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

correlation with the experimentally determined trans-splic-ing efficiencies we performed this assay using CAT mRNAas the substrate The output of the trans-tagging assay is thenumber of times a given splice site is identified in a set ofproduct sequences obtained from in vitro trans-splicingreactions with randomized IGSs (Jones et al 1996) Weobtained a total of 66 product sequences which wereconsistent with trans-splicing at 25 different uridines amongthe 186 uridines that follow the start codon in the CATmRNA sequence (Table 1) The splice sites detected mostfrequently were at uridines 97 and 33 with 14 and 12occurrences respectively On the other hand 15 of the 25detected splice sites were found only once The splice sitesdetected by the trans-tagging assay are represented bydiamonds in Figure 2A

When the experimentally determined trans-splicing ef-ficiencies were plotted as a function of the splice site countsin the trans-tagging assay no correlation was evident (Fig6) More strikingly the assay did not find splice sites 258and 405 even though these splice sites were more efficientthan splice site 97 which was detected 14 times by thetrans-splicing assay Similarly the assay found each of thesplice sites 448 and 197 only once in 66 product sequenceseven though these splice sites were two of the five mostefficient ones among the 18 tested splice sites Thereforethe number of times that a given splice site is found by thetrans-tagging assay does not necessarily reflect the trans-splicing efficiency measured for that splice site

Possible biases of the trans-tagging assay

Why did the trans-tagging results only partially reflect themeasured splice site efficiency One possible explanation isthat the PCR step in the assay disfavors long RT-PCRproducts a well-known phenomenon in quantitative PCR(Bustin 2000) In our trans-tagging experiments this phe-nomenon may have caused the pattern seen in Figure 2AHere the forward PCR primer was designed to identifysplice sites between positions 14 and 643 on CAT mRNAbut only four out of the 25 identified splice sites were found

at positions beyond uridine 197 (Table 1) In contrast thefive most efficient splice sites (also with DGbind below thethreshold of4 kcalmol) were more evenly distributed overthe CAT mRNA (positions 97 197 258 405 and 448) Wereasoned that if the PCR step indeed caused more splicesites to be identified near the forward PCR primer thanelsewhere then moving this primer farther downstreamfrom the mRNA 59 terminus should reveal a differentpattern of detected splice sites now crowded near the newposition of the forward primer Therefore we repeated thetrans-tagging experiment on CAT mRNA this time usinga forward primer designed to identify only splice sites atpositions 207 to 643 The resulting nine product sequencesyielded eight new splice sites those at uridines 248 271273 321 350 378 384 and 405 (data not shown in Fig2A) None of these splice sites had been found among theprevious set of 66 trans-tagging product sequences The newsplice sites were again crowded near the forward primerThese results suggest that the PCR step may cause thetrans-tagging assay to miss efficient splice sites located 200nt or more downstream from the forward PCR primer

A second possible explanation for a bias in the trans-tagging assay lies in the different rates at which ribozymeswith different IGSs are transcribed by T7 RNA polymerasein vitro The assay employs a pool of trans-splicing ribo-zymes whose IGS differs between ribozymes Because the IGSrepresents the 59 terminus of the transcript and because thetranscription efficiency of T7 RNA polymerase is stronglydependent on the sequence of the first 6 nt in the transcript(Milligan et al 1987 Milligan and Uhlenbeck 1989) wehypothesized that the difference in transcription efficiencyachieved for different ribozymes may bias the results of thetrans-tagging assay

To test this second hypothesis we measured the tran-scription efficiency for the 18 studied ribozymes Specifi-cally the ribozymes were individually transcribed in thepresence of a-[32P]-GTP the transcription products were

TABLE 1 Results of the trans-tagging assay

Counta Position of the splice siteb

14 9712 338 544 993 322 18 83 87 166 1771 14 65 73 84 110 131 153 175

179 180 197 356 448 499 573

aNumber of times that a splice site was identified among 66 productsequencesbPosition of a uridine relative to the adenine of the start codon inCAT mRNA

FIGURE 6 Comparison between trans-splicing efficiency and trans-tagging results The measured trans-splicing efficiency for each splicesite is plotted against the number of times that the splice site wasfound in 66 product sequences obtained with the trans-tagging assay(filled diamonds) Open diamonds denote splice sites that were notfound by the trans-tagging assay Note that eight of the 18 sequencesare clustered at the origin of the plot

Meluzzi et al

596 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 7: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

correlation with the experimentally determined trans-splic-ing efficiencies we performed this assay using CAT mRNAas the substrate The output of the trans-tagging assay is thenumber of times a given splice site is identified in a set ofproduct sequences obtained from in vitro trans-splicingreactions with randomized IGSs (Jones et al 1996) Weobtained a total of 66 product sequences which wereconsistent with trans-splicing at 25 different uridines amongthe 186 uridines that follow the start codon in the CATmRNA sequence (Table 1) The splice sites detected mostfrequently were at uridines 97 and 33 with 14 and 12occurrences respectively On the other hand 15 of the 25detected splice sites were found only once The splice sitesdetected by the trans-tagging assay are represented bydiamonds in Figure 2A

When the experimentally determined trans-splicing ef-ficiencies were plotted as a function of the splice site countsin the trans-tagging assay no correlation was evident (Fig6) More strikingly the assay did not find splice sites 258and 405 even though these splice sites were more efficientthan splice site 97 which was detected 14 times by thetrans-splicing assay Similarly the assay found each of thesplice sites 448 and 197 only once in 66 product sequenceseven though these splice sites were two of the five mostefficient ones among the 18 tested splice sites Thereforethe number of times that a given splice site is found by thetrans-tagging assay does not necessarily reflect the trans-splicing efficiency measured for that splice site

Possible biases of the trans-tagging assay

Why did the trans-tagging results only partially reflect themeasured splice site efficiency One possible explanation isthat the PCR step in the assay disfavors long RT-PCRproducts a well-known phenomenon in quantitative PCR(Bustin 2000) In our trans-tagging experiments this phe-nomenon may have caused the pattern seen in Figure 2AHere the forward PCR primer was designed to identifysplice sites between positions 14 and 643 on CAT mRNAbut only four out of the 25 identified splice sites were found

at positions beyond uridine 197 (Table 1) In contrast thefive most efficient splice sites (also with DGbind below thethreshold of4 kcalmol) were more evenly distributed overthe CAT mRNA (positions 97 197 258 405 and 448) Wereasoned that if the PCR step indeed caused more splicesites to be identified near the forward PCR primer thanelsewhere then moving this primer farther downstreamfrom the mRNA 59 terminus should reveal a differentpattern of detected splice sites now crowded near the newposition of the forward primer Therefore we repeated thetrans-tagging experiment on CAT mRNA this time usinga forward primer designed to identify only splice sites atpositions 207 to 643 The resulting nine product sequencesyielded eight new splice sites those at uridines 248 271273 321 350 378 384 and 405 (data not shown in Fig2A) None of these splice sites had been found among theprevious set of 66 trans-tagging product sequences The newsplice sites were again crowded near the forward primerThese results suggest that the PCR step may cause thetrans-tagging assay to miss efficient splice sites located 200nt or more downstream from the forward PCR primer

A second possible explanation for a bias in the trans-tagging assay lies in the different rates at which ribozymeswith different IGSs are transcribed by T7 RNA polymerasein vitro The assay employs a pool of trans-splicing ribo-zymes whose IGS differs between ribozymes Because the IGSrepresents the 59 terminus of the transcript and because thetranscription efficiency of T7 RNA polymerase is stronglydependent on the sequence of the first 6 nt in the transcript(Milligan et al 1987 Milligan and Uhlenbeck 1989) wehypothesized that the difference in transcription efficiencyachieved for different ribozymes may bias the results of thetrans-tagging assay

To test this second hypothesis we measured the tran-scription efficiency for the 18 studied ribozymes Specifi-cally the ribozymes were individually transcribed in thepresence of a-[32P]-GTP the transcription products were

TABLE 1 Results of the trans-tagging assay

Counta Position of the splice siteb

14 9712 338 544 993 322 18 83 87 166 1771 14 65 73 84 110 131 153 175

179 180 197 356 448 499 573

aNumber of times that a splice site was identified among 66 productsequencesbPosition of a uridine relative to the adenine of the start codon inCAT mRNA

FIGURE 6 Comparison between trans-splicing efficiency and trans-tagging results The measured trans-splicing efficiency for each splicesite is plotted against the number of times that the splice site wasfound in 66 product sequences obtained with the trans-tagging assay(filled diamonds) Open diamonds denote splice sites that were notfound by the trans-tagging assay Note that eight of the 18 sequencesare clustered at the origin of the plot

Meluzzi et al

596 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 8: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

separated by denaturing polyacrylamide gel electrophoresisand the transcription efficiency was measured by quanti-tating the relevant bands on autoradiograms The measuredtranscription efficiency varied eightfold among the differ-ent ribozymes (Fig 7 Supplemental Table S1) Notablyribozyme 97 had the highest transcription efficiency five-fold higher than that of ribozyme 258 These measureddifferences in ribozyme amounts represent lower boundsfor the actual differences affecting the trans-tagging assaySpecifically in the individual transcription reactions par-tial saturation may have been reached for the more efficienttranscriptions but not for the less efficient ones This partialsaturation would have reduced the observed difference be-tween efficient and less efficient transcriptions On the otherhand in the trans-tagging assay all ribozymes were tran-scribed together from equimolar amounts of templatesunder the same reaction conditions leading to equal extentsof product saturation for all ribozymes Hence the differencein the amounts of ribozymes 97 and 258 was likely greaterthan fivefold in the trans-tagging assay Therefore theobserved bias in transcription efficiency for ribozymeswith different IGSs could explain why splice site 97 wasfound most frequently in the trans-tagging assay whilesplice site 258 which was the most efficient in trans-splicingexperiments with CAT mRNA was absent in all 75 productsequences that we obtained with the trans-tagging assay

A third potential source of bias for the outcome of thetrans-tagging assay is the tendency of group I intron variantsto lose their 39-exon via side reactions known as 39-specifichydrolysis and G-exchange (Inoue et al 1986 Thompsonand Herrin 1991 van der Horst and Inoue 1993 Haugenet al 2004) Indeed a significant loss of 39-exons wasrevealed during the separation of in vitro transcribed ribo-zymes on denaturing polyacrylamide gels (above) Specificallyfor each band of a full-length ribozyme we detected a fastermigrating band which corresponded in size to ribozymes thathad lost their 39-exon If the loss of 39-exons duringtranscription and during trans-splicing depends on the

IGS then the pool of transcribed ribozymes used in thetrans-tagging assay will be biased against specific splicesites

To determine whether the loss of 39-exons varied withIGS under in vitro transcription conditions we quantitatedthe relevant bands on the same autoradiograms that wereused to measure transcription efficiency The resulting frac-tions of lost 39-exons varied from 12 6 4 to 396 4 withan average of 28 6 7 over all 18 ribozymes (SupplementalTable S1) This means that among the studied ribozymesbetween 88 and 61 of the transcribed ribozymes retainedtheir 39-exon generating only a 14-fold bias for the trans-tagging assay Therefore 39-exon loss during transcription wasnot a major contributor to the large bias observed in thetrans-tagging assay

The loss of 39-exons could also have taken place duringthe trans-splicing reaction Because these conditions weredifferent from the transcription conditions we also mea-sured the loss of 39-exons under trans-splicing conditionsInternally radiolabeled ribozymes with a 39-exon were size-purified by denaturing PAGE then incubated under trans-splicing conditions and the fraction of ribozymes that losttheir 39-exon was determined by a second denaturing PAGEand quantitation by phosphorimaging The fraction of 39-exons lost after four hours varied from 226 4 to 656 7with an average of 40 6 10 over all ribozymes (Supple-mental Table S1) Therefore we estimate that during the 1-hincubation under trans-tagging conditions between z80and z95 of the ribozymes retained their 39-exon suggest-ing that the small proportion of ribozymes that lost their 39-exon should not constitute a major bias in the trans-taggingassay

In summary we found that the results of the trans-taggingassay were skewed by experimental biases The strongestinfluences appeared to originate from a product-size biasin the PCR step and from different efficiencies in thetranscription of ribozymes whereas a smaller influencecame from the loss of 39-exons by ribozymes duringtranscription and trans-splicing

DISCUSSION

We have developed and experimentally tested a com-putational approach to identify efficient splice sites fortrans-splicing ribozymes This approach is based on thecomputation of the free energy change DGbind for thehybridization and secondary structure rearrangements ac-companying the first step of trans-splicing ie the bindingof IGS to substrate The computed values of DGbind werefound to correlate well with experimentally determinedtrans-splicing efficiencies at 18 different splice sites on CATmRNA The correlation was significantly better than thecorrelation of experimental trans-splicing efficiencies withthe results of the trans-tagging assay suggesting that theproposed computational approach could provide an alter-

FIGURE 7 Transcription efficiencies of ribozymes analyzed in thisstudy All values are relative to the transcription efficiency of theribozyme targeting splice site 97 Transcription efficiencies were de-termined from band intensities of internally [32P]-labeled ribozymeswhich had been transcribed in vitro and separated by denaturingpolyacrylamide gel electrophoresis The targeted splice site is in-dicated at the bottom of each bar Error bars are standard deviationsfrom seven independent transcriptions

Computational prediction of splice sites

wwwrnajournalorg 597

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 9: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

native solution to the identification of efficient splice sites fortrans-splicing ribozymes

In particular our results suggest that a set of candidateefficient splice sites for trans-splicing ribozymes could bedetermined by selecting those sites with DGbind lt 4 kcalmol Although other mRNAs may show different values forDGbind that better discriminate between efficient and in-efficient splice sites a threshold of DGbind lt 4 kcalmolmay provide a rough guideline for choosing candidate splicesites on other mRNA substrates Our results also suggest thatthe unfolding of substrate mRNA at the target site thehybridization of the substrate to the ribozyme and therelease of IGS from its secondary interactions with the 39-exon all contribute toward the efficiency of a given splicesite on long mRNAs albeit to varying degrees

The above results may seem unsurprising at first becausecomputations similar to ours have already been appliedto identify target sites for different types of RNA-bindingmolecules For example a thermodynamic cycle equivalentto our calculation of DGbind predicted antisense oligonu-cleotides with high binding affinity for rabbit b-globin andmouse tumor necrosis factor-a mRNAs achieving 60accuracy and a significant correlation with experimentaldata (Walton et al 1999) Analogous calculations includedthe concentration-dependent effects of oligonucleotide di-merization predicting oligonucleotide-target affinities con-sistent with results from an RNase-H mapping assay on theAT1 receptor mRNA and from a trans-tagging assay onsickle b-globin mRNA (Mathews et al 1999) More re-cently a target site accessibility measure provided by theprogram RNAPLFOLD and related to our DGunfold-target wasused to improve the prediction of effective siRNAs (Taferet al 2008) A similar calculation also improved andsimplified the prediction of effective microRNA targets inDrosophila melanogaster tissue culture cells (Kertesz et al2007) Another study used a statistical sampling techniqueto calculate DG values analogous to our DGbind (Shao et al2007b) These values were found to correlate significantlywith experimentally measured trans-cleavage activities of 15hammerhead ribozymes targeting transcripts of humanbreast cancer resistance protein Contrary to our resultsthis study found that DGunfold-target contributes more thanDGhybrid toward the observed correlation underscoring thestructural and mechanistic differences between trans-cleav-ing and trans-splicing ribozymes (Scott 2007) Lastly theprogram INTARNA which we used to calculate DGbind wasmore effective than other software in predicting 18 targetsof small bacterial regulatory RNAs (Busch et al 2008)

Despite the previous achievements by similar methodsit was not obvious at the outset whether our proposedcomputations of DGbind would correlate well with theefficiency of trans-splicing ribozymes because these mol-ecules are more complex than those investigated in mostof the previous reports Although Mathews et al (1999)found some agreement between their DGbind calculations

and the counts of accessible splice sites reported by Lanet al (1998) this comparison considered only a short70-nt region of the mRNA substrate and did not directlymeasure trans-splicing efficiency In contrast our studyindividually tested the trans-splicing efficiency on 18 differ-ent splice sites distributed over the full length of the CATmRNA and compared the experimental results to the com-puted values of DGbind Our study also included theribozyme 39-exon into the calculations The good correla-tion we observed between experiments and computationconfirms that a calculation of DGbind based on establishedRNA folding algorithms can be used to identify efficientsplice sites for trans-splicing ribozymes

The scope of the present study was intentionally limited toproviding a first assessment of the proposed calculation ofDGbind as a tool for predicting efficient splice sites Thereforewe carried out experiments using a single model mRNAsubstrate and we targeted a subset of the possible splice siteson this substrate On the other hand the trans-tagging assayhas already been successfully applied to map efficient splicesites on at least eight different mRNA substrates (Lan et al1998 Watanabe and Sullenger 2000 Rogers et al 2002 Parket al 2003 Ryu and Lee 2003 Jung and Lee 2005 Fiskaa et al2006 Kim et al 2007) Moreover the proposed calculation ofDGbind relies on secondary structure prediction algorithmsthat are very fast but not always accurate because they do nottake into account all possible RNA interactions and they aresensitive to errors in experimental energy parameters (Laytonand Bundschuh 2005 Condon and Jabbari 2009) Thusfuture experimental studies on additional mRNA substrateswill show whether calculations of DGbind provide a reliablemeans of predicting efficient splice sites

The proposed method is attractive because the calcula-tions of DGbind do not require bench-work and can beperformed in one afternoon In contrast the experimentalassay from the preparation of ribozymes and substratemRNA to the analysis of trans-splicing product sequencesrequires at least one week of work Thus a computationalapproach once proven reliable will likely be cheaper andquicker than the experimental route Moreover the com-putational approach can be developed further to satisfyadditional design requirements For example if the aimof the ribozymes is to repair a mutated mRNA sequencethen the sequence of the 39-exon must be adjusted accord-ing to the targeted splice site (Long and Sullenger 1999)This adjustment which is likely important based on ouranalysis of energetic contributions toward trans-splicing effi-ciency would be difficult to achieve with the trans-taggingassay whose pool of ribozymes presently carries the same39-exon sequence for all targeted splice sites On the otherhand the proposed computational approach can easily bemodified to adjust the 39-exon sequence in accordance withthe splice site

Another design requirement might be to avoid trans-splicing ribozymes that react at multiple splice sites This

Meluzzi et al

598 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 10: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

behavior was observed in the present study for ribozymestargeting splice sites 405 and 448 (Fig 3A) Such cross-reactivity was likely caused by the similarity of the corre-sponding IGSs which differ in only one out of 6 nt (GGGGAAfor splice site 405 versus GGGGAU for splice site 448) Acomputational approach can easily overcome this prob-lem as a simple comparison of the predicted IGSs sufficesto detect and discard potentially cross-reactive ribozymes

The computational approach could also accommodatethe design of 59-extended guide sequences (EGSs) whichhave been shown to strongly increase trans-splicing effi-ciency (Kohler et al 1999 Fiskaa and Birgisdottir 2010)The design of these EGSs depends critically on the mRNAsequence immediately downstream from the splice site Ifthe trans-tagging assay were to be used for the simultaneousoptimization of IGSs and EGSs then the sequence of theEGSs would have to covary with the randomized IGS Thiswould be a very laborious task because each sequence wouldhave to be synthesized individually Therefore EGS optimi-zation cannot be easily achieved with the trans-tagging assayIn contrast our computational approach can be modified toinclude a splice site-specific EGS optimization which wouldallow one not only to identify efficient splice sites on a givenmRNA but also to maximize their efficiency

In conclusion the proposed computational approachtogether with future algorithmic extensions aimed at opti-mizing 39-exon and EGS could facilitate the development ofefficient trans-splicing ribozymes while elucidating the in-teractions between trans-splicing ribozymes and their mRNAsubstrates

MATERIALS AND METHODS

Prediction of DGbind

The values of the binding free energy change DGbind and itscomponents were initially computed using the Vienna RNA package(Hofacker et al 1994) Specifically we used RNAEVAL to calculateDGhybrid and RNAFOLD to calculate DGunfold-target and DGrelease-IGSusing the partition function approach which yields an ensembleaverage of DG values associated with all possible pseudo knot-freesecondary structures achievable with a given RNA sequence(McCaskill 1990) We then found that the same calculations canbe carried out more conveniently using the INTARNA software((Busch et al 2008) httprnainformatikuni-freiburgde8080IntaRNAjsp) which was developed to find putative target sitesfor bacterial small regulatory RNAs (sRNAs typically 50ndash250 nt inlength) on long mRNA sequences This software takes as input twosequences a short sRNA and a long target mRNA The softwareoutputs a list of possible base-pairing interactions between the twoRNAs together with the corresponding DGbind = DGhybrid +DGunfold-mRNA + DGunfold-sRNA where DGhybrid is the hybridizationfree energy component of the interaction and DGunfold-mRNAsRNA

is the cost in free energy for locally unfolding the region of mRNAsRNA involved in the base-pairing interaction To calculateDGbind for the possible splice sites on CAT mRNA the 678-nt

sequence of this substrate was specified as the input mRNA se-quence and the shortened ribozyme sequence GXXXXXNNNNNACGCACGTCAATTGGCCGCTGGATGGGGCCCCTGTGAAGTGTTGCTGAGCAACGCGCTGGCGCGGCTCAGAGGCTTC wasspecified as the input sRNA sequence where GXXXXX denotesa specific IGS NNNNN is a linker replacing the body of theribozyme and the rest of the sequence is the 77-nt 39-exon usedfor the trans-tagging and trans-splicing experiments in this studyTo calculate DGbind for the short 13-mer substrates the inputsRNA sequence was the shortened ribozyme sequence shownabove and the input mRNA sequence was the 13-mer substratesequence GGYYYYYUAAAAA where YYYYY is the reverse com-plement of the five IGS bases XXXXX For all calculations themaximum length of the hybridized region was seven the upperenergy threshold was 10 kcalmol the exact number of seed basepairs was three the maximum number of reported suboptimalinteractions was 10000 while all other parameters of the softwarehad default values Requesting 10000 suboptimal interactions wasnecessary to obtain DGbind values for as many as possible of theleast efficient splice sites tested in this study However requesting40 suboptimal interactions was sufficient to predict the five mostefficient of the tested splice sites Only interactions involving eachIGS and the corresponding target site on the substrate wereextracted from the programrsquos output The above procedure wasautomated using a short Perl script which is available from GAupon request Values of DGbind for CAT mRNA were computedusing window sizes of 100 200 300 400 500 and 600 nt whichcover uniformly the length of the substrate Because each windowsize produced different outliers in the plot of experimental trans-splicing efficiency versus DGbind (data not shown) we averagedDGbind over all six window sizes thus obtaining a more consistenttrend (Fig 3B) Error bars of DGbind in Figure 3B are standarddeviations over the six window sizes Splice sites whose DGbind

value was reported for less than three window sizes were notassessed for energetic contributions and are not shown in Figure 3BWhen no DGbind values were returned for specific combinations ofwindow size and splice site the value zero was used for DGbind inFigure 2 All calculations for the 13-mer substrates were done witha window size of 13 nt

Ribozymes

To prepare DNA templates for transcription of trans-splicingribozymes with specific IGSs (Supplemental Table S1) a DNAfragment for each such ribozyme was amplified by PCR usingforward primer GCGTAATACGACTCACTATAGXXXXXAAAAGTTATCAGGCATGCACC and reverse primer GAAGCCTCTGAGCCGCGCCAG (PR1) from a plasmid containing the Tetra-hymena ribozyme gene linked to a 77-nt segment of the alpha-mannosidase gene as the 39-exon (Einvik et al 2004) The aboveforward primer contains the promoter for T7 RNA polymerasethe desired IGS nucleotides GXXXXX and the first 21 nucleotidesdownstream from the IGS in the Tetrahymena ribozyme sequenceAll PCR products were cloned into the EcoRI BamHI sites ofpUC19 using appropriate PCR primers and the ribozyme se-quences were confirmed by sequencing DNA templates for ribo-zyme transcription were obtained from these ribozyme-encodingplasmids by PCR amplification using forward primer GCGTAATACGACTCACTATAG and reverse primer PR1 To minimize theloss of 39-exons ribozyme transcriptions were carried out at 30degC for

Computational prediction of splice sites

wwwrnajournalorg 599

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 11: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

20ndash30 min The transcripts were purified by denaturing poly-acrylamide gel electrophoresis (PAGE) and eluted from gel slicesRibozyme concentrations were determined from absorbancemeasurements at 260 nm using an extinction coefficient of40 mM1cm1

Substrates

The template for run-off transcription of CAT mRNA was am-plified by PCR from plasmid pLysS (Novagen) in two consecutivePCRs The first PCR used forward primer CAGGAGCTAAGGAAGCTAAAATG and reverse primer CGCCCCGCCCTGCCACTCATC the second PCR used forward primer GCGTAATACGACTCACTATAGCAGGAGCTAAGGAAGCTAAAATG and the same re-verse primer After transcription the full-length CAT mRNA waspurified on Micro Bio-Spin 6 columns (Bio-Rad) dephosphory-lated with Antarctic phosphatase (NEB) and radiolabeled at the 59terminus using g-[32P]-ATP (Perkin Elmer) and polynucleotidekinase (NEB) The radiolabeled substrate was purified by de-naturing 5 PAGE and eluted from gel slices using RNA elutionsolution (20ndash 50 mM MOPSNaOH pH 70 02 SDS 300 mMNaCl) The 13-mer substrates were prepared as described previously(Milligan and Uhlenbeck 1989) to obtain the sequences GGYYYYYUAAAAA where YYYYY is the reverse complement of the fiveIGS bases XXXXX adjacent to the 59-terminal G of the ribozymeTemplates for transcription were obtained by annealing the senseoligonucleotide GCTAATACGACTCACTATAG which containsa T7 RNA polymerase promoter with antisense DNA oligonucle-otides TTTTTAXXXXXCCTATAGTGAGTCGTATTAGC In vitrotranscription was carried out at 37degC for 1 h in the presence ofa-[32P]-UTP (Perkin Elmer) using 500 nM DNA template andz1unitmL of T7 RNA polymerase The radiolabeled transcripts werepurified by denaturing 20 PAGE

Trans-splicing reactions

Ribozymes and substrates were mixed in a buffer containing 1mM MgCl2 135 mM KCl 50 mM MOPSNaOH pH 70 20 mMGTP and 2 mM spermidine and incubated at 37degC Before mixingwith the substrate the ribozymes were preincubated for 10 min at37degC in the absence of magnesium In reactions with CAT mRNAsubstrate ribozyme concentrations were 15 mM substrate con-centration was z50 nM and the incubation time was 4 h Thisreaction time was chosen as a measure for trans-splicing efficiencybecause after this time the signal for the most efficient splice site(258) began saturating while the signals for most other splice siteswere relatively weak or absent In reactions with 13-mer substratesribozyme concentrations were 100 nM substrate concentrationswere z10 nM and the incubation time was 8 min At this timenone of the reaction products appeared saturated and all ribozyme-substrate pairs yielded bands sufficiently strong for quantitationThe samples were separated by denaturing 20 PAGE and visualizedby phosphorimaging (Bio-Rad PMI) Band intensities in theresulting digital images were quantified using custom computersoftware that employs previously described curve-fitting pro-cedures (Shadle et al 1997 Mitov et al 2009) Trans-splicingefficiencies were calculated using the formula (trans-splicingefficiency) = (trans-splicing product) (trans-splicing product +59-cleavage product + unreacted substrate) where the amount ofeach species was assumed proportional to the corresponding

band intensity Error bars were calculated as standard deviationsover three independent experiments

Confirmation of specific trans-splicing products

Samples from trans-splicing reactions that yielded visible bands onautoradiograms of polyacrylamide gels were reverse-transcribedusing AMV reverse transcriptase (NEB) with reverse primer PR2The RT products were amplified by PCR using the nested reverseprimer PR3 and one of the forward primers PF1 CGGAATTCGATATGGGATAGTGTTCACCC CGGAATTCCCGTTGATATATCCCAATGGC CGGAATTCTGGCCTATTTCCCTAAAGGG andGCCTTTATTCACATTCTTGCC which anneal at different posi-tions on the CAT mRNA sequence The PCR products were clonedinto the plasmid pUC19 and sequenced

Trans-tagging assay

The trans-tagging assay was carried out essentially as described(Einvik et al 2004) All ribozymes were synthesized by run-off invitro transcription using T7 RNA polymerase from DNA tem-plates that were generated by PCR Preparation of the ribozymepool was as described above for ribozymes with specific IGSsexcept that here the forward primer encoding the T7 promotercontained five randomized nucleotides at the positions denotedwith X The trans-tagging reactions contained 100 nM substrate10 nM ribozyme 02 mM GTP 5 mM MgCl2 50 mM MOPSKOH pH 70 135 mM KCl and 2 mM spermidine After thesubstrate and the ribozyme were preincubated in separate tubes inthe absence of MgCl2 for 10 min at 37degC all reagents were mixedand incubated for 1 h at 37degC Reaction products were reversetranscribed with AMV reverse transcriptase (NEB) using primerGAAGCCTCTGAGCCGCG (PR2) The template RNA was hy-drolyzed by incubation in 200 mM NaOH at 90degC for 10 min RTproducts were amplified by PCR using nested reverse primer GCGGATCCTGCTCAGCAACACTTCACAGG (PR3) and forwardprimers CGAATTCCAGGAGCTAAGGAAGCTAAAATG (PF1)or CGGAATTCCATTCTTGCCCGCCTGATGAATGC cloned intopUC19 and sequenced The resulting sequences were comparedwith the ribozyme 39-exon sequence and the CAT mRNA sequenceto determine the positions of the splice sites on CAT mRNA

Ribozyme transcription efficiency and 39-exon loss

Ribozymes with specific IGSs (Supplemental Table S1) wereindividually transcribed by T7 RNA polymerase in 20-mL reactionscontaining a-[32P]-GTP (Perkin Elmer) from DNA templates thatwere prepared as described above The reactions were carried out at30degC for 20 min and were stopped by the addition of formamideloading buffer The ribozymes were heat-denatured at 80degC for2 min and separated by denaturing 5 PAGE The resulting gels wereanalyzed as described above for the trans-splicing reactions A totalof seven independent reactions per ribozyme were carried out Tocalculate the relative transcription efficiencies the absolute intensityof the band corresponding to each intact ribozyme was divided bythe absolute intensity of the band corresponding to ribozyme 97Error bars are standard deviations of the relative transcriptionefficiencies To calculate the fraction of 39-exons lost during trans-cription for each ribozyme the quantitated intensity of the fastermigrating band was divided by the sum of the intensities of this

Meluzzi et al

600 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 12: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

band and of the band for the intact ribozyme Error bars arestandard deviations of these fractions Ribozymes were eluted fromthe gel slices and individually incubated without substrate underthe same trans-splicing conditions described above Samples takenat times 0 and 4 h reaction time were analyzed by denaturing 5PAGE and autoradiography as described above for trans-splicingreaction products No loss of 39-exon was seen at time 0 Fractionsof lost 39-exon were calculated as for the transcription reactions

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article

ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health(T32DK007233 to KEO Hemoglobin and Blood Protein Chem-istry training grant to E Komives) the ARCS Foundation SanDiego Chapter (Scholarship to DM) and UC San Diego (toGA) We thank the reviewers for their constructive commentswhich resulted in several improvements to the manuscript

Received August 11 2011 accepted December 2 2011

REFERENCES

Adams PL Stahley MR Kosek AB Wang J Strobel SA 2004 Crystalstructure of a self-splicing group I intron with both exons Nature430 45ndash50

Ayre BG Kohler U Goodman HM Haseloff J 1999 Design of highlyspecific cytotoxins by using trans-splicing ribozymes Proc NatlAcad Sci 96 3507ndash3512

Backofen R Hess WR 2010 Computational prediction of sRNAs andtheir targets in bacteria RNA Biol 7 33ndash42

Bai Y Gong H Li H Vu GP Lu S Liu F 2011 Oral delivery of RNaseP ribozymes by Salmonella inhibits viral infection in mice ProcNatl Acad Sci 108 3222ndash3227

Been MD Cech TR 1986 One binding site determines sequencespecificity of Tetrahymena pre-rRNA self-splicing trans-splicingand RNA enzyme activity Cell 47 207ndash216

Bell MA Sinha J Johnson AK Testa SM 2004 Enhancing the secondstep of the trans excision-splicing reaction of a group I ribozyme byexploiting P90 and P10 for intermolecular recognition Biochemistry43 4323ndash4331

Busch A Richter AS Backofen R 2008 IntaRNA Efficient predictionof bacterial sRNA targets incorporating target site accessibility andseed regions Bioinformatics 24 2849ndash2856

Bustin SA 2000 Absolute quantification of mRNA using real-timereverse transcription polymerase chain reaction assays J MolEndocrinol 25 169ndash193

Byun J Lan N Long M Sullenger BA 2003 Efficient and specificrepair of sickle beta-globin RNA by trans-splicing ribozymes RNA9 1254ndash1263

Campbell TB Cech TR 1995 Identification of ribozymes withina ribozyme library that efficiently cleave a long substrate RNARNA 1 598ndash609

Carter JR Keith JH Barde PV Fraser TS Fraser MJ Jr 2010Targeting of highly conserved Dengue virus sequences with anti-Dengue virus trans-splicing group I introns BMC Mol Biol 11 84doi 1011861471-2199-11-84

Cech TR Damberger SH Gutell RR 1994 Representation of thesecondary and tertiary structure of group I introns Nat Struct Biol1 273ndash280

Chan JH Lim S Wong WS 2006 Antisense oligonucleotides Fromdesign to therapeutic application Clin Exp Pharmacol Physiol 33533ndash540

Condon A Jabbari H 2009 Computational prediction of nucleic acidsecondary structure Methods applications and challenges TheorComput Sci 410 294ndash301

Ding Y Lawrence CE 2001 Statistical prediction of single-strandedregions in RNA secondary structure and application to predictingeffective antisense target sites and beyond Nucleic Acids Res 291034ndash1046

Dotson PP II Johnson AK Testa SM 2008 Tetrahymena thermophilaand Candida albicans group I intron-derived ribozymes cancatalyze the trans-excision-splicing reaction Nucleic Acids Res36 5281ndash5289

Doudna JA Cormack BP Szostak JW 1989 RNA structure notsequence determines the 59 splice-site specificity of a group Iintron Proc Natl Acad Sci 86 7402ndash7406

Einvik C Fiskaa T Lundblad EW Johansen S 2004 Optimizationand application of the group I ribozyme trans-splicing reactionMethods Mol Biol 252 359ndash371

Far RK Leppert J Frank K Sczakiel G 2005 Technical improvementsin the computational target search for antisense oligonucleotidesOligonucleotides 15 223ndash233

Fiskaa T Birgisdottir AB 2010 RNA reprogramming and repair basedon trans-splicing group I ribozymes New Biotechnol 27 194ndash203

Fiskaa T Lundblad EW Henriksen JR Johansen SD Einvik C 2006RNA reprogramming of alpha-mannosidase mRNA sequences invitro by myxomycete group IC1 and IE ribozymes FEBS J 2732789ndash2800

Gao Y Liu XL Li XR 2011 Research progress on siRNA delivery withnonviral carriers Int J Nanomedicine 6 1017ndash1025

Golden BL Kim H Chase E 2005 Crystal structure of a phage Twortgroup I ribozyme-product complex Nat Struct Mol Biol 12 82ndash89

Guo F Gooding AR Cech TR 2004 Structure of the Tetrahymenaribozyme Base triple sandwich and metal ion at the active siteMol Cell 16 351ndash362

Guo P Coban O Snead NM Trebley J Hoeprich S Guo S Shu Y2010 Engineering RNA for targeted siRNA delivery and medicalapplication Adv Drug Deliv Rev 62 650ndash666

Haugen P Andreassen M Birgisdottir AB Johansen S 2004 Hydro-lytic cleavage by a group I intron ribozyme is dependent on RNAstructures not important for splicing Eur J Biochem 271 1015ndash1024

Hofacker IL Fontana W Stadler PF Bonhoeffer LS Tacker MSchuster P 1994 Fast folding and comparison of RNA secondarystructures Monatsh Chem 125 167ndash188

Inoue T Sullivan FX Cech TR 1985 Intermolecular exon ligation ofthe rRNA precursor of Tetrahymena Oligonucleotides can func-tion as 59 exons Cell 43 431ndash437

Inoue T Sullivan FX Cech TR 1986 New reactions of the ribosomalRNA precursor of Tetrahymena and the mechanism of self-splicingJ Mol Biol 189 143ndash165

Johnson AK Sinha J Testa SM 2005 Trans insertion-splicingRibozyme-catalyzed insertion of targeted sequences into RNAsBiochemistry 44 10702ndash10710

Jones JT Lee SW Sullenger BA 1996 Tagging ribozyme reactionsites to follow trans-splicing in mammalian cells Nat Med 2643ndash648

Jung HS Lee SW 2005 Re-engineering of carcinoembryonic antigenRNA with the group I intron of Tetrahymena thermophila bytargeted trans-splicing J Microbiol Biotechnol 15 1408ndash1413

Jung HS Lee SW 2006 Ribozyme-mediated selective killing of cancercells expressing carcinoembryonic antigen RNA by targeted trans-splicing Biochem Biophys Res Commun 349 556ndash563

Karbstein K Carroll KS Herschlag D 2002 Probing the Tetrahymenagroup I ribozyme reaction in both directions Biochemistry 4111171ndash11183

Kay PS Inoue T 1987 Catalysis of splicing-related reactions betweendinucleotides by a ribozyme Nature 327 343ndash346

Computational prediction of splice sites

wwwrnajournalorg 601

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from

Page 13: Computational prediction of efficient splice sites for trans-splicing …maeresearch.ucsd.edu/~arya/paper31.pdf · 2012. 2. 21. · METHOD Computational prediction of efficient splice

Kertesz M Iovino N Unnerstall U Gaul U Segal E 2007 The role ofsite accessibility in microRNA target recognition Nat Genet 391278ndash1284

Kim A Ban G Song MS Bae CD Park J Lee SW 2007 Selectiveregression of cells expressing mouse cytoskeleton-associated pro-tein 2 transcript by trans-splicing ribozyme Oligonucleotides 1795ndash103

Kohler U Ayre BG Goodman HM Haseloff J 1999 Trans-splicingribozymes for targeted gene delivery J Mol Biol 285 1935ndash1950

Kruger K Grabowski PJ Zaug AJ Sands J Gottschling DE Cech TR1982 Self-splicing RNA Autoexcision and autocyclization of theribosomal RNA intervening sequence of Tetrahymena Cell 31147ndash157

Kwon BS Jung HS Song MS Cho KS Kim SC Kimm K Jeong JSKim IH Lee SW 2005 Specific regression of human cancer cellsby ribozyme-mediated targeted replacement of tumor-specifictranscript Mol Ther 12 824ndash834

Lan N Howrey RP Lee SW Smith CA Sullenger BA 1998Ribozyme-mediated repair of sickle beta-globin mRNAs in eryth-rocyte precursors Science 280 1593ndash1596

Lan N Rooney BL Lee SW Howrey RP Smith CA Sullenger BA2000 Enhancing RNA repair efficiency by combining trans-splicing ribozymes that recognize different accessible sites ona target RNA Mol Ther 2 245ndash255

Layton DM Bundschuh R 2005 A statistical analysis of RNA foldingalgorithms through thermodynamic parameter perturbationNucleic Acids Res 33 519ndash524

Lee SJ Lee SW Jeong JS Kim IH 2010 In vivo reprogramming ofhuman telomerase reverse transcriptase (hTERT) by trans-splicingribozyme to target tumor cells Methods Mol Biol 629 307ndash321

Lehnert V Jaeger L Michel F Westhof E 1996 New loop-looptertiary interactions in self-splicing introns of subgroup IC and IDA complete 3D model of the Tetrahymena thermophila ribozymeChem Biol 3 993ndash1009

Lipchock SV Strobel SA 2008 A relaxed active site after exon ligationby the group I intron Proc Natl Acad Sci 105 5699ndash5704

Long MB Sullenger BA 1999 Evaluating group I intron catalyticefficiency in mammalian cells Mol Cell Biol 19 6479ndash6487

Lu ZJ Mathews DH 2008 Efficient siRNA selection using hybrid-ization thermodynamics Nucleic Acids Res 36 640ndash647

Mathews DH Burkard ME Freier SM Wyatt JR Turner DH 1999Predicting oligonucleotide affinity to nucleic acid targets RNA 51458ndash1469

McCaskill JS 1990 The equilibrium partition function and base pairbinding probabilities for RNA secondary structure Biopolymers29 1105ndash1119

Milligan JF Uhlenbeck OC 1989 Synthesis of small RNAs using T7RNA polymerase Methods Enzymol 180 51ndash62

Milligan JF Groebe DR Witherell GW Uhlenbeck OC 1987Oligoribonucleotide synthesis using T7 RNA polymerase andsynthetic DNA templates Nucleic Acids Res 15 8783ndash8798

Mitov MI Greaser ML Campbell KS 2009 GelBandFitterndashA com-puter program for analysis of closely spaced electrophoretic andimmunoblotted bands Electrophoresis 30 848ndash851

Pan J Thirumalai D Woodson SA 1997 Folding of RNA involvesparallel pathways J Mol Biol 273 7ndash13

Park Y-H Jung H-S Kwon B-S Lee S-W 2003 Replacement ofthymidine phosphorylase RNA with group I intron of Tetrahymenathermophila by targeted trans-splicing J Microbiol 41 340ndash344

Phylactou LA Darrah C Wood MJ 1998 Ribozyme-mediated trans-splicing of a trinucleotide repeat Nat Genet 18 378ndash381

Pichon C Felden B 2008 Small RNA gene identification and mRNAtarget predictions in bacteria Bioinformatics 24 2807ndash2813

Rogers CS Vanoye CG Sullenger BA George AL Jr 2002 Functionalrepair of a mutant chloride channel using a trans-splicing ribozymeJ Clin Invest 110 1783ndash1789

Ryu KJ Lee SW 2003 Identification of the most accessible sites toribozymes on the hepatitis C virus internal ribosome entry siteJ Biochem Mol Biol 36 538ndash544

Ryu KJ Kim JH Lee SW 2003 Ribozyme-mediated selective in-duction of new gene activity in hepatitis C virus internal ribosomeentry site-expressing cells by targeted trans-splicing Mol Ther 7386ndash395

Scott WG 2007 Ribozymes Curr Opin Struct Biol 17 280ndash286Shadle SE Allen DF Guo H Pogozelski WK Bashkin JS Tullius TD

1997 Quantitative analysis of electrophoresis data Novel curvefitting methodology and its application to the determination ofa protein-DNA binding constant Nucleic Acids Res 25 850ndash860

Shao Y Wu Y Chan CY McDonough K Ding Y 2006 Rationaldesign and rapid screening of antisense oligonucleotides forprokaryotic gene modulation Nucleic Acids Res 34 5660ndash5669

Shao Y Chan CY Maliyekkel A Lawrence CE Roninson IB Ding Y2007a Effect of target secondary structure on RNAi efficiencyRNA 13 1631ndash1640

Shao Y Wu S Chan CY Klapper JR Schneider E Ding Y 2007bA structural analysis of in vitro catalytic activities of hammerheadribozymes BMC Bioinformatics 8 469 doi 1011861471-2105-8-469

Shi X Mollova ET Pljevaljcic G Millar DP Herschlag D 2009Probing the dynamics of the P1 helix within the Tetrahymenagroup I intron J Am Chem Soc 131 9571ndash9578

Sullenger BA Cech TR 1994 Ribozyme-mediated repair of defectivemRNA by targeted trans-splicing Nature 371 619ndash622

Tafer H Ameres SL Obernosterer G Gebeshuber CA Schroeder RMartinez J Hofacker IL 2008 The impact of target site accessibilityon the design of effective siRNAs Nat Biotechnol 26 578ndash583

Thomas CE Ehrhardt A Kay MA 2003 Progress and problems withthe use of viral vectors for gene therapy Nat Rev Genet 4 346ndash358

Thomas M Lieberman J Lal A 2010 Desperately seeking microRNAtargets Nat Struct Mol Biol 17 1169ndash1174

Thompson AJ Herrin DL 1991 In vitro self-splicing reactions of thechloroplast group I intron CrLSU from Chlamydomonas rein-hardtii and in vivo manipulation via gene-replacement NucleicAcids Res 19 6611ndash6618

van der Horst G Inoue T 1993 Requirements of a group I intron forreactions at the 39 splice site J Mol Biol 229 685ndash694

Walton SP Stephanopoulos GN Yarmush ML Roth CM 1999Prediction of antisense oligonucleotide binding affinity to a struc-tured RNA target Biotechnol Bioeng 65 1ndash9

Walton SP Wu M Gredell JA Chan C 2010 Designing highly activesiRNAs for therapeutic applications FEBS J 277 4806ndash4813

Wang JY Drlica K 2004 Computational identification of antisenseoligonucleotides that rapidly hybridize to RNA Oligonucleotides14 167ndash175

Waring RB Towner P Minter SJ Davies RW 1986 Splice-siteselection by a self-splicing RNA of Tetrahymena Nature 321133ndash139

Watanabe T Sullenger BA 2000 Induction of wild-type p53 activityin human cancer cells by ribozymes that repair mutant p53transcripts Proc Natl Acad Sci 97 8490ndash8494

Zarrinkar PP Sullenger BA 1998 Probing the interplay between thetwo steps of group I intron splicing Competition of exogenousguanosine with omega G Biochemistry 37 18056ndash18063

Meluzzi et al

602 RNA Vol 18 No 3

Cold Spring Harbor Laboratory Press on February 15 2012 - Published by rnajournalcshlporgDownloaded from