syn bases: their prevalence, relevance, and utility in

51
The Pennsylvania State University The Graduate School College of Science SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN FUNCTIONAL RNA A Thesis in Chemistry by Stephanie A. Reigh © 2009 Stephanie A. Reigh Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science December 2009

Upload: others

Post on 14-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

The Pennsylvania State University

The Graduate School

College of Science

SYN BASES: THEIR PREVALENCE,

RELEVANCE, AND UTILITY

IN FUNCTIONAL RNA

A Thesis in

Chemistry

by

Stephanie A. Reigh

© 2009 Stephanie A. Reigh

Submitted in Partial Fulfillment

of the Requirements

for the Degree of

Master of Science

December 2009

Page 2: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

ii

The thesis of Stephanie A. Reigh was reviewed and approved* by the following:

Philip C. Bevilacqua

Professor of Chemistry

Thesis Advisor

Scott Showalter

Assistant Professor of Chemistry

Scott Philips

Assistant Professor of Chemistry

Kenneth Keiler

Associate Professor of Biochemistry and Molecular Biology

Barbara J. Garrison

Professor of Chemistry

Head of the Department of Chemistry

*Signatures are on file in the Graduate School

Page 3: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

iii

Abstract

Due to a high number of rotatable bonds in both the ribose sugar and phosphate

backbone, nucleotides in RNA can occupy a wide ensemble of conformational states.

One conformational state of interest is when a base takes the syn conformation, in which

the base resides over the sugar and the Watson-Crick face of a nucleotide is positioned

towards the phosphate backbone. I show herein that the syn conformation is common in

functional RNA, often in functional locations in riboswitches, aptamers, and ribozymes.

In the hepatitis delta virus ribozyme, as an example, only one base in 100 takes the syn

conformation, but mutation of that base reduces catalytic activity as much as 3000-fold.

Syn bases cluster in the binding pockets of both the lysine riboswitch and the malachite

green aptamer, participating in stacking and hydrogen bonding interactions with their

respective ligands.

To further investigate the utility of syn bases in functional RNA, conformationally

restricted nucleotides (CRNs) are used to populate the native state, either through

stabilization of the native state or destabilization of a misfolded state. 8-bromopurines

can be successfully incorporated into RNA during transcription, and these CRNs favor

the syn conformation. These CRNs have already been incorporated systematically to

improve kinetics in the leadzyme system. I present preliminary evidence that 8BrGTP

and 8BrATP can be incorporated during transcription. Future directions of this project

will incorporate CRNs at random sites to see whether function can be restored or

enhanced from syn base insertion.

Page 4: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

iv

Table of Contents

List of Figures ......................................................................................................................v

List of Tables ..................................................................................................................... vi

List of Abbreviations ........................................................................................................ vii

Acknowledgements .......................................................................................................... viii

Chapter 1: Introduction to RNA Chemistry, Structure, and Function .................................1

1.1 The evolutionary beginning of life ..........................................................................1

1.2 The chemistry and versatility of RNA .....................................................................2

1.3 Conformationally restricted nucleotides ..................................................................4

1.4 Mechanism of RNA self-cleavage ...........................................................................6

1.5 Aptamers and riboswitches ......................................................................................7

1.6 Outline of thesis .......................................................................................................9

References ....................................................................................................................10

Chapter 2: The Prevalence and Relevance of Syn Bases in Functional RNA ...................11

2.1 The ribose ring and RNA bases can take on different conformations ...................11

2.2 Building an RNA database for analysis of syn bases .............................................13

2.3 General statistics of syn bases across all data ........................................................16

2.4 Analysis of syn bases by category of functional RNA ..........................................23

2.5 Conclusion .............................................................................................................30

References ....................................................................................................................31

Chapter 3: Towards NAME: Incorporation of 8-Bromopurines into Functional RNA

During Transcription ....................................................................................................32

3.1 CRNs and RNA structure/function relationships ...................................................32

3.2 Incorporation of modified nucleotides into RNA ..................................................33

3.3 Future directions: Detecting modified nucleotides in enhanced RNA ..................39

References ....................................................................................................................43

Page 5: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

v

List of Figures

Figure 1.1 Important conformations in RNA ................................................................................... 2

Figure 1.2 Guanosine in the anti or syn conformation .................................................................... 2

Figure 1.3 Energy diagram of RNA folding .................................................................................... 3

Figure 1.4 8BrATP and 8BrGTP take the syn conformation .......................................................... 4

Figure 1.5 Structures of the leadzyme, with the cleavage site indicated by an arrow. .................... 5

Figure 1.6 A YNMG hairpin in equilibrium with a misfolded duplex state .................................... 6

Figure 1.7 Mechanism of RNA self-cleavage. ................................................................................. 7

Figure 1.8 Two structures of the Malachite Green Aptamer ........................................................... 8

Figure 2.1 Overhead view of syn versus anti bases. ...................................................................... 12

Figure 2.2 Distribution of chi angles in syn and anti bases ........................................................... 17

Figure 2.3 Bar graph correlating chi angle with sugar pucker for syn G ....................................... 20

Figure 2.4 5’ and 3’ nearest neighbors of syn bases ...................................................................... 22

Figure 2.5 Examples of syn base locations in RNA aptamers and riboswitches ........................... 25

Figure 2.6 G1 of glmS hydrogen bonding to glucosamine-6-phosphate ....................................... 27

Figure 2.7 G24 of the leadzyme, MC-Sym structure ..................................................................... 27

Figure 2.8 G206 of the self-splicing Group I intron ...................................................................... 29

Figure 2.9 G25 of the HDV ribozyme and A38 of the hairpin ribozyme ...................................... 29

Figure 3.1 Removal of the C2 amine in guanosine converts the nucleotide to inosine ................. 33

Figure 3.2 Comparing protocols 1 and 2 for plasmid transcription, ATP variable ........................ 36

Figure 3.3 Comparing multiple transcription variables simultaneously ........................................ 37

Figure 3.4 An alpha-thiotriphosphate and a phosphorothioate incorporated into an RNA

backbone .................................................................................................................................. 41

Page 6: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

vi

List of Tables

Table 2.1 Syn base statistics ........................................................................................................... 16

Table 2.2 Sugar pucker frequency by base type ............................................................................ 19

Table 2.3 Stacking and base pairing interactions of individual bases ............................................ 23

Table 3.1 Transcription conditions for two protocols .................................................................... 35

Page 7: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

vii

List of Abbreviations

8BrA 8-bromoadenosine

8BrG 8-bromoguanosine

CRN conformationally restricted nucleotide

GlcN6P glucosamine-6-phosphate

MG malachite green

MGA malachite green aptamer

NAIM nucleotide analogue interference mapping

NAME nucleotide analogue mapping of enhancement

RNP ribonucleoprotein

RT reverse transcription

TMR tetramethylrosamine

UTR untranslated region

YNMG pyrimidine, any, A or C, G

Page 8: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

viii

Acknowledgements

I would like to thank my advisor, Phil Bevilacqua, for his patience and support. I

would like to acknowledge my committee, Scott Phillips, Scott Showalter, and Ken

Keiler for taking the time to read my thesis. I appreciate the help of all of my lab mates

for teaching me molecular biology techniques for RNA. I would like to especially thank

Sarah Krahe and Joshua Sokoloski for the preliminary work and cooperation in collecting

the data to write this thesis.

I want to thank my family and friends for supporting me throughout life,

especially during graduate school. My fiancé, AJ, deserves a huge thank you for always

listening to me, even if he did not understand the science. And lastly, I would like to

thank God, without whom none of this would have been possible.

Page 9: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

1

Chapter 1

Introduction to RNA Chemistry, Structure, and Function

1.1 The evolutionary beginning of life

The simplest cell is full of chemical complexity. The origin of cells from

primordial soup can seem statistically impossible, but somehow, life exists on Earth.

Since Louis Pasteur disproved the theory of spontaneous generation, we have been

searching for answers to the question of how life began. The definition of life requires

the ability to self-assemble, self-sustain, and reproduce.1 Because of these criteria, some

scientists believe the earliest biomolecules were not DNA or proteins, but RNA. RNA

has the ability to transmit a genetic code like DNA (mRNA), interpret it (tRNA/rRNA),

and perform catalysis like proteins (ribozymes). Also, less energy is required to

synthesize RNA as compared to DNA and proteins.

The RNA World Hypothesis states that life began as RNA recombination, which

eventually began to synthesize proteins.2 This theory was sparked by the discovery of

ribozymes. The theory states that, as life began to evolve, proteins improved on the

reaction rates of ribozymes, causing RNA enzymes to become less prevalent. Ribozymes

are found in less evolved life, and understanding their chemical properties could reveal

aspects of the earliest forms of life on Earth.

Page 10: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

2

1.2 The chemistry and versatility of RNA

Ribonucleic acid, or RNA, is a polymer of nucleosides. Each nucleoside consists

of a phosphate, a ribose sugar, and a nucleobase (Figure 1.1 A). The phosphate is

attached to the 5’-carbon of the ribose, and successive nucleotides are added to the 3’-

hydroxyl. The ribose sugar has a total of 10 distinct conformations, describing which

atom is above (endo) or below (exo) the plane of the ring (Figure 1.1 B). The four

typical bases in RNA are adenine (A), guanine (G), cytosine (C), and uracil (U). Each

base can rotate freely around its bond to C1’ of the ribose. When a base points away

from the sugar, with the Watson-Crick face exposed (like in the DNA double helix), the

base is in the anti conformation, which is the most common conformation. Occasionally

the base points inward and sits overtop the sugar in the syn conformation (Figure 1.2).

Figure 1.1. Important conformations in RNA. A. An RNA chain, where R represents a

nucleobase. B. Sample conformations of the ribose sugar pucker. C. The 10 sugar puckers.

Figure 1.2. Guanosine in the anti (left) or syn (right) conformation

1’

2’ 3’

4’

5’

1

2

3

4

5 7

8

9

6

C.

Page 11: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

3

Despite its limited chemical diversity, RNA has the ability to catalyze reactions

including self-cleavage,3 ligation,

4 and even Diels-Alder reactions.

5 Because of folding

issues, some ribozymes, like the ribosome, are supported by a protein scaffold. It has

been demonstrated that the proteins in the ribosome are necessary only for structure and

do not participate in function.6 Theoretically, ribozymes and other ribonucleoprotein

(RNP) complexes could be catalytically active without their proteins if they were able to

fold correctly.

Part of my thesis research asserts that by increasing the native-state population of

a folded ribozyme, catalytic RNA can have improved reaction rates. Increasing the

population of the native state can be accomplished by two means: stabilizing the native

state or destabilizing misfolded states (Figure 1.3). The native-state population of some

RNA can be increased by the incorporation of conformationally restricted nucleotides

(CRNs).7

ΔGMN

ΔGMN

U M N

S1 S2

U M

N

U M N

ΔGo37

Figure 1.3. Energy diagram for RNA folding. The energy distance between native-state (n) and

misfolded-state (m) conformations can be widened by two methods: stabilizing the native state

(scheme 1, left) or destabilizing the misfolded state (scheme 2, right). The unfolded state (u)

should theoretically have the highest energy.

Page 12: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

4

1.3 Conformationally restricted nucleotides

In double-stranded RNA, G can base pair with either C or U, causing misfolding

to be major problem for RNA. Nature has evolved proteins to support complex RNA

structures, called ribonucleoprotein complexes (RNPs). The ribosome is a classic

example of an RNP. In smaller RNA systems, where proteins are not incorporated, the

native-state conformations can be stabilized through the incorporation of CRNs. Present

CRNs consist of two main types: locked nucleic acids (LNAs), and 8-Bromopurine

triphosphates (8BrATP, 8BrGTP, Figure 1.4). LNAs force a ribose ring to assume the

C3’-endo conformation through the use of a carbon bridge connecting the 2’-OH to the 4’

position of the ring.8 8-bromopurine triphosphates encourage the base to take the syn

conformation by disfavoring the anti conformation due to the steric clash of the bromine.

Our research focuses on syn bases and their importance to RNA structure and function.

CRNs have been experimentally demonstrated to improve native-state population

through both schemes: stabilization of a native state and destabilizion of a misfolded

state. An example of Scheme 1 stabilization is the analysis of the native state of the lead-

dependent ribozyme (leadzyme) using 8BrG. The leadzyme is a ribozyme where syn

bases appear in the active site. When three different structures of the leadzyme were

compared (crystal, NMR, and molecular model), each structure had a syn base in the

Figure 1.4. 8BrATP (left) and 8BrGTP (right) take the syn conformation

Page 13: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

5

active site, but in a different position (Figure 1.5, from Yajima et. al). To elucidate which

structure was the most catalytically relevant, Yajima and co-workers inserted 8BrG into

each respective position and recorded the rate of cleavage. Three different synthetic

RNA constructs were designed containing an 8BrG at G7, G9, or G24, and the cleavage

rates were observed. When the syn base was inserted at G24, the syn G in the molecular

model, the observed kinetic rate was 30-fold faster than for wild type.9 The MC-Sym

molecular model structure was determined to be the most catalytically active structure.

Insertion of 8BrG where syn bases are predicted to occur is an example of stabilizing the

native state.

Figure 1.5. Structure of the leadzyme, with the site of cleavage indicated by an arrow.7 The active

site is in the dotted box. For B-D, the syn base is shown in a solid box. Insertion of 8BrG at G24

caused a 30-fold increase in rate, supporting the MC-Sym structure.

Page 14: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

6

An example of Scheme 2 stabilization through CRN insertion is a simple

hairpin/duplex equilibrium. The native state in a YNMG hairpin (where Y = pyrimidine,

N = any, M = A or C) such as UUCG, was found to be similar in free energy when

compared to the duplex state (Figure 1.6, from Proctor et. al.).7 Using an 8BrG in the

hairpin, however, increases the energy of the misfolded state. When the G in the

tetraloop is substituted with 8BrG, the G favors the syn conformation. The syn

conformation disfavors G-Y hydrogen bonding, destabilizing the duplex state.

1.4 Mechanism of RNA self-cleavage

Catalytic RNA were discovered by Tom Cech and coworkers and published in

1982.10

Later studies determined that the ribosome was a ribozyme,11

rather than proteins

performing the chemistry. Interest in catalytic RNA has continued to increase.

Valadkhan and coworkers have attempted to analyze a spliceosome model system,

another large RNP complex found in living organisms, to determine if it, too, is a

ribozyme.12

Ribozyme chemistry is possible due to the presence of the 2’-OH (Figure 1.7,

Yajima et. al.). In large ribozymes, an exogenous nucleophile attacks the phosphate

Figure 1.6. Example of a YNMG hairpin (h) in equilibrium with a misfolded duplex (d) state.8

This equilibrium is driven to the left by insertion of 8BrG at the base highlighted in red, which

destabilizes the duplex state. The Watson-Crick face of G is unavailable for base pairing when

forced into the syn conformation by the 8Br.

Page 15: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

7

backbone. The cleavage reaction leaves a 2’-3’ cis diol and a 5’ monophosphte. In small

ribozymes, the oxygen on the -1 nucleotide acts as a nucleophile, attacking the phosphate

functionality attached to the 3’ oxygen. The +1 nucleotide acts as a leaving group and

has a 5’-OH. RNA catalysis necessitates a distinct tertiary structure, and syn bases, as

shown in this thesis, often play important roles.

1.5 Aptamers and Riboswitches

Ribozymes are not the only types of functional RNA. Aptamers are RNA

selected in vitro to bind proteins or small molecules. Most, if not all, functional RNAs

have the potential to benefit from syn base insertion at key sites, as shown in this thesis.

The malachite green (MG) aptamer is a good example (Figure 1.8). MG has two

potential ligands: the cognate ligand, malachite green, and the non-cognate ligand,

tetramethylrosamine (TMR).13

Crystal structures show structural differences in the MG

aptamer when MG or TMR is bound. When MG is bound, the MG aptamer has three syn

bases. When TMR is bound, the MG aptamer has two syn bases. The structures of

Figure 1.7. Mechanism of RNA self-cleavage.10

Left: large ribozyme mechanism, with an

exogenous nucleophile. Right: Self-splicing of a small ribozyme. The 2’ hydroxyl makes this

reaction possible.

Page 16: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

8

TMR-bound and MG-bound MG aptamer have one syn base in common. CRN insertion

could be used in this system to see if changing which bases take the syn conformation

alters the aptamer’s specificity for the cognate versus non-cognate ligand.

In contrast to aptamers which are in vitro selected, riboswitches are functional

RNA aptamers that bind ligands and are found in vivo. The glucosamine-6-phosphate

(GlcN6P) riboswitch, which has been found in archaea and bacteria, also has ribozyme

functionality.14

The 5’ untranslated region (UTR) of the gene that codes for the

glucosamine synthetase (glmS) enzyme has tertiary structure that can bind GlcN6P.

Figure 1.8. Two structures of the Malachite Green Aptamer (MGA). Left: MGA with MG bound.

The blue syn base (G24) is common to both structures. The two bases in teal (G29 and A31) are

syn bases that occur uniquely when MG is bound. The ligand (MG) is shown in pale green.

Right: MGA with TMR bound. The base shown in red (A30) is a syn base. The ligand (TMR) is

shown in pink.

Page 17: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

9

When GlcN6P is in excess, ligand binding alters the tertiary structure, causing the RNA

to self-cleave.15

The dual purpose of the glmS system (riboswitch and ribozyme) makes

it an interesting molecule for further study and is discussed later.

1.6 Outline of Thesis

Chapter 2 outlines the computational chemistry study of functional RNA

structures. NMR and crystal structures of more than one hundred functional RNAs were

analyzed for the presence of syn bases. We recorded several structural aspects of all the

syn bases, including stacking, base-pairing, and nearest neighbor interactions. The

collected statistics show many types of RNA structures (riboswitch, ribozyme, RNA

aptamers, and the ribosome) have syn bases in functional locations in the molecules. This

thesis helped to expand the current information about syn bases in functional RNA

beyond that of the leadzyme and malachite green aptamer. The generated database will

be useful in further experiments in which syn bases are probed by chemical means.

In Chapter 3, RNA transcriptions, which was used to investigate the incorporation

of 8BrNTPs, are described. The efficiency of incorporation is found to vary by

transcription conditions and 8BrNTP identity. Investigation of 8BrNTP incorporation

lays the groundwork for the eventual goal of this project, a method to uncover or enhance

function in ribozymes or RNPs, similar to the leadzyme study. Using random

incorporation of 8BrNTPs can show stabilization of ribozymes either by stabilizing the

native state or destabilizing misfolded states.

Page 18: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

10

References

1. Koshland, D. E., Jr., Special essay. The seven pillars of life. Science, 2002, 295, 2215-6.

2. Gilbert, W., Origin of Life: The RNA World. Nature, 1986, 319, 618.

3. Cech, T. R., The chemistry of self-splicing RNA and RNA enzymes. Science, 1987, 236,

1532-9.

4. Briones, C.; Stich, M.; Manrubia, S. C., The dawn of the RNA World: toward functional

complexity through ligation of random RNA oligomers. Rna, 2009, 15, 743-9.

5. Seelig, B.; Jaschke, A., A small catalytic RNA motif with Diels-Alderase activity. Chem Biol,

1999, 6, 167-76.

6. Rodnina, M. V.; Beringer, M.; Wintermeyer, W., How ribosomes make peptide bonds.

Trends Biochem Sci, 2007, 32, 20-6.

7. Proctor, D. J.; Kierzek, E.; Kierzek, R.; Bevilacqua, P. C., Restricting the conformational

heterogeneity of RNA by specific incorporation of 8-bromoguanosine. J Am Chem Soc, 2003,

125, 2390-1.

8. Julien, K. R.; Sumita, M.; Chen, P. H.; Laird-Offringa, I. A.; Hoogstraten, C. G.,

Conformationally restricted nucleotides as a probe of structure-function relationships in

RNA. Rna, 2008, 14, 1632-43.

9. Yajima, R.; Proctor, D. J.; Kierzek, R.; Kierzek, E.; Bevilacqua, P. C., A conformationally

restricted guanosine analog reveals the catalytic relevance of three structures of an RNA

enzyme. Chem Biol, 2007, 14, 23-30.

10. Kruger, K.; Grabowski, P. J.; Zaug, A. J.; Sands, J.; Gottschling, D. E.; Cech, T. R., Self-

splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence

of Tetrahymena. Cell, 1982, 31, 147-57.

11. Noller, H. F.; Hoffarth, V.; Zimniak, L., Unusual resistance of peptidyl transferase to protein

extraction procedures. Science, 1992, 256, 1416-9.

12. Valadkhan, S., The spliceosome: a ribozyme at heart? Biol Chem, 2007, 388, 693-7.

13. Flinders, J.; DeFina, S. C.; Brackett, D. M.; Baugh, C.; Wilson, C.; Dieckmann, T.,

Recognition of planar and nonplanar ligands in the malachite green-RNA aptamer complex.

Chembiochem, 2004, 5, 62-72.

14. Klein, D. J.; Been, M. D.; Ferre-D'Amare, A. R., Essential role of an active-site guanine in

glmS ribozyme catalysis. J Am Chem Soc, 2007, 129, 14858-9.

15. Winkler, W. C.; Nahvi, A.; Roth, A.; Collins, J. A.; Breaker, R. R., Control of gene

expression by a natural metabolite-responsive ribozyme. Nature, 2004, 428, 281-6.

Page 19: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

11

Chapter 2

The Prevalence and Relevance of Syn Bases in Functional RNA

This chapter is a computational study analyzing the statistics of syn bases in functional

RNA. The work was performed in cooperation with Joshua Sokoloski, graduate student in

the Bevilacqua lab. Most of the experiments were performed jointly, except where noted.

2.1 The ribose ring and RNA bases can take on different conformations

Due to a high number of rotatable bonds in both the ribose sugar and phosphate

backbone, nucleotides in RNA can occupy a wide ensemble of conformational states.

One conformational state of particular interest is the syn conformation, in which the base

resides over the sugar and the Watson-Crick face of a nucleotide is pointed towards the

phosphate backbone. In this study, we examine functional RNAs with the nucleic acid

structure analysis program MC-Annotate1 (http://www-lbit.iro.umontreal.ca/mcannotate-

simple/), a web-based system for analyzing RNA conformations based on the more

extensive MC-Sym program, for the occurrence, interactions, and functionality of the

bases possessing the syn glycosidic conformation. The motivation for this study is the

possibility that syn bases cluster in the active sites of RNAs where they play important

functional roles.

Page 20: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

12

A B C

The most common and energetically favorable orientation of a base is the anti

conformation (Figure 2.1A). This conformation has the Watson-Crick face exposed as it

would be in a double helix. In the syn conformation, the base is rotated about the

glycosidic bond to occupy the space directly above the ribose ring (Figure 2.1B). Owing

to sterics, the syn conformation is higher in energy and therefore less populated,

particularly for pyrimidines where the O2 points towards the sugar. Both experiments

and calculations validate this prediction. Most A-form RNA duplexes (and B-form DNA

helices) feature bases entirely in the anti conformation. Z-form structure is the only

instance where helical nucleic acids have bases which regularly adopt the syn

conformation. However, crystal and solution structures of functional RNA (aptamers,

riboswitches, ribozymes, tRNA, and the ribosome) reveal that, with the presence of

tertiary structure, comes a small but significant population of syn bases.

For the syn state to populate appreciably, one of two possible conditions should be

met. Either the penalty in conformational energy must be matched or exceeded by

favorable inter- or intramolecular interactions by the base in the syn state, or the base in

Figure 2.1 Overhead view of anti (A) versus syn (B, C) bases. For the sake of this study, syn

bases were distributed between two categories: weak (B) and strong (C). Parameters for these

designations are described in the text. The angles in degrees in each panel designate median

angles based on all data studied.

Page 21: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

13

the anti conformation must present an even greater steric clash with another portion of

the RNA, making a syn base relatively favored.

Recent efforts at analyzing the substantial structural information available on

functional RNAs have focused on identifying and characterizing key structural motifs.

These studies have looked at the backbone conformation and the hydrogen bonding and

stacking patterns among RNA structures but have not analyzed prevalence and relevance

of the syn conformation in those molecules. Here, we present a survey of syn bases in

aptamers, riboswitches, ribozymes, and the ribosome using the MC-Annotate program.

2.2 Building an RNA database for analysis of syn bases

Definition of Syn: In this study, the syn conformation is defined by the IUPAC

designation of a glycosidic torsion angle of 0 ± 90o.2 Our study subdivides theses bases

into strongly (-45

o ≤≤90) or weakly (-90 ≤<-

45o) syn. This delineation is based upon

the torsion angles where the base is syn and directly above the sugar (strong) and where it

is syn but not above the ribose sugar (weak). This classification can be seen in Figure 2.1.

As the average χ value for A-form RNA is -100o, it is possible that weak syn bases can

still participate in inter- and intra-molecular interactions like anti bases in secondary and

tertiary structure. Therefore, weak syn bases can be considered as a class intermediate to

anti and strong syn conformations. The following data are therefore presented in terms of

total syn bases, strong syn bases, and weak syn bases.

Database Assembly: Structures for analysis were obtained via the RCSB Protein Data

Bank by searching with the following terms: “RNA aptamer,” “ribozyme,” “riboswitch,”

Page 22: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

14

“tRNA,” and “ribosome.” Candidate structures were downloaded as pdb.gz files and

analyzed with the program MCAnnotate1 to find syn bases. MCAnnotate provided

glycosidic conformation, sugar pucker, stacking, and base-pairing data. Exact torsion

angles were measured using DSViewerPro (Accelerys, San Diego, CA). Functional

location data was assessed in terms of direct ligand contact, or active site presence, or

indirect functional roles (as determined by biochemical studies from the primary

literature). Direct ligand contact was scored when the syn base either hydrogen bonded

or stacked with a ligand in aptamers and riboswitches. Hydrogen bonding was

determined by use of the H-Bond Monitor tool in DSViewerPro, while stacking was

assigned on the basis of a distance of 4 Å or less between the base and an aromatic

moiety on the ligand. To assign putative functional roles in active sites at a distance from

the active site or binding pocket, the original experimental literature for each structure

was consulted. If the publication stated that the base participated in function, it was

scored as such. No additional assessments of functional relevance, other than direct

ligand contact, were made.

The assembled database was parsed to ensure that no structures or bases were

overrepresented in the statistics. The individual syn base database was parsed

specifically to include every unique base, where a unique base is defined as having a

characteristic combination of the following terms: molecule name, base type, residue

number, sugar pucker, and 5’/3’ neighbors. For example, the streptomycin bound RNA

aptamer has two structures available: 1NTA and 1NTB. 1NTA and 1NTB have two syn

bases in common, G12 and C18, while 1NTB has one unique syn base, A8. The sugar

Page 23: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

15

pucker and nearest neighbors for each structure were examined, and G12 and C18 were

found to have the same sugar puckers and nearest neighbors in both structures. Thus, out

of 5 raw database entries (two in 1NTA and three in 1NTB), three were considered for

analysis: G12 and C18, which are identical with respect to the two structures, and A8

from 1NTB. When two entries have all five parsing criteria the same, but different

stacking or base pairing interactions, sugar pucker and nearest neighbor statistics

contained one entry for the two candidates, while stacking or base-pairing statistics listed

two entries.

In order to determine the statistical significance of some aspects of syn base

structural features, a control database of anti conformation bases was assembled with the

same RNA molecules that were used for the syn data. The anti bases of the 50S (PDB

1K73) and 30S (2OW8) ribosomal subunits were used to assemble the control database

on every parameter except χ torsion angles. 170 anti bases from the ribosome (120 from

the 50S and 50 from the 30S) and 120 anti bases from the other structures examined were

chosen at random for the control database. Statistics and plots were generated using

Origin (OriginLabs, Northhampton Massachuetts) and Microsoft Excel. Pymol (DeLano

Scientific, San Francisco, California) was used for all molecular images.

Page 24: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

16

2.3 General statistics of syn bases across all data

Statistics on individual nucleotoides

In the first phase of Protein Databank analysis for syn bases, we assayed RNA

length, number, and syn base type. This was done in order to establish a baseline of

general frequency and relevance of syn bases in functional RNAs. In RNAs not other

than the ribosome, length ranged from 12-316 bases, with an average molecule length of

62 nt. Initial studies of 8833 unparst nt across 144 RNA, 272 bases (or 3.1%) were in the

syn conformation. The parsed data including the ribosome had 325 of 8630 bases in the

syn conformation, or 3.8%. Of these bases, syn A and G were found to comprise 41%

and 39% of all syn bases, respectively (Table 2.1). The distribution of syn bases

depended on the RNA cases examined (see below). Adenine was more commonly syn

than G in riboswitches and protein aptamers, but G was more commonly syn than A in

small molecule apamers and riozymes. C was more commonly syn than U in protein

aptamers, and no syn C’s were found in tRNA.

Table 2.1 Number (Percent) Syn Base

Molecule type A C G U Total % syn

Aptamer (Protein) 11/21 (52.4%) 2/21 (9.5%) 6/21 (28.6%) 2/21 (9.5%) 21/425 (4.9%)

Aptamer (Small Molecule) 6/26 (23.1%) 3/26 (11.5%) 15/26 (57.7%) 3/26 (11.5%) 26/505 (5.1%)

Riboswitch 31/58 (53.4%) 6/58 (10.3%) 12/58 (20.7%) 9/58 (15.5%) 58/1548 (3.7%)

Ribozyme 10/43 (23.3%) 4/43 (9.3%) 23/43 (53.5%) 6/43 (14.0%) 43/1122 (3.8%)

tRNA 5/12 (35.7%) 0/12 (0.0%) 6/12 (42.9%) 1/12 (7.1%) 12/564 (2.1%)

Ribosome (50s) 53/120 (44.2%) 9/120 (7.5%) 48/120 (40.0%) 10/120 (8.3%) 120/2876 (4.2%)

Ribosome (30s) 20/45 (44.4%) 3/45 (6.7%) 17/45 (37.8%) 5/45 (11.1%) 45/1490 (3.0%)

Total 135 (41.0%) 29 (8.8%) 128 (38.9%) 37 (11.2%) 325/8630 (3.8%)

Table 2.1 Syn base statistics. “11/21” means that 21 syn bases were found, 11 of which were A’s.

A and G take the syn conformation in similar frequency, with A slightly more common overall. In

RNA-protein systems, such as protein aptamers and the ribosome, A is more commonly syn than

G. C is the most rare syn base in all cases except for aptamers.

Page 25: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

17

To determine the relative strength or weakness of a syn base, each pdb file

containing at least one unique syn base was opened in DS Viewer Pro. The angles were

measured and recorded. The frequency of types of bases in specific ranges are

represented in Figure 2.2. These frequencies were also compared to the control database

of anti angles. We found that angles of 0±45o were less common, only comprising

7% of all syn bases studied. Syn bases with angles of -45o to – 90

o have intermediate

frequency (33%) and 45o to 90

o were the most common at 60%. No anti bases were

found to have angles in the 90-180o range, while -90

o to -135

o and -135

o to -180

o are

equally common.

Next, we looked at the frequency of syn bases within specific sub-categories of

RNA (Table 2.1). Aptamers had the largest fraction of syn bases per nucleotide, both

protein and small molecule around 5%. In tRNA, syn bases are the rarest at 2.1%. For

ribozymes, 3.8% of all bases were syn. In the ribosome, 4.2% of bases in the 50S subunit

(length: 2753 nt) were syn, compared to only 3.0% of the 30S subunit (length: 1490 nt).

A B C

Figure 2.2 Distribution of angles in syn and anti bases. angles in the range of 0±45o were less

common than other syn angles. Anti bases studied were entirely in the range of -90-

-180.

Page 26: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

18

By far, the most common sugar pucker when a base has the anti conformation is

C3’-endo (80-90%, data not shown), while the most common in syn bases is C2’-

endo,but only at 35-40% (Table 2.2A). For instance, in the ribosome (Table 2.2B), A and

G assume 7 of 10 sugar puckers, and U and C only take 4 of 10 puckers. O4’-exo is

never observed as a sugar pucker. For ribunucleosides, the energy difference between

C3’-endo and C2’-endo is negligible in all bases, which likely accounts for variable

puckers in syn bases.3 The exception is C, where C3’-endo is favored by ~1 kcal/mol.

This energy difference may indicate why syn C is the rarest syn base.

The angles were then correlated with sugar puckers (Figure 2.3). The bar graph

reveals that some sugar puckers (such as C3’-endo and C2’-endo) have a wide number of

available angles, while some (C4’-endo) display very few angles, which may be the

reason for the rarity of these puckers. This is in agreement with the RNA conformational

map compiled by Murthy and co-workers.4

Page 27: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

19

Table 2.2A Sugar Pucker Frequency (Percent) For All Syn Bases

A C G U Total

C3'-endo 32 (24.1%) 3 (10.3%) 26 (20.5%) 9 (23.7%) 70 (21.4%)

C4'-exo 6 (4.5%) 2 (6.9%) 13 (10.2%) 3 (7.9%) 24 (7.3%)

O4'-endo 7 (5.3%) 1 (3.4%) 4 (3.1%) 1 (2.6%) 13 (4.0%)

C1'-exo 13 (9.8%) 1 (3.4%) 15 (11.8%) 4 (10.5%) 33 (10.1%)

C2'-endo 48 (36.1%) 17 (58.6%) 44 (34.6%) 14 (36.8%) 123 (37.6%)

C3'-exo 19 (14.3%) 2 (6.9%) 15 (11.8%) 4 (10.5%) 40 (12.2%)

C4'-endo 0 0 2 (1.6%) 1 (2.6%) 3 (0.9%)

O4'-exo 0 0 0 0 0

C1'-endo 0 1 (3.4%) 1 (0.8%) 0 2 (0.6%)

C2'-exo 8 (6.0%) 2 (6.9%) 7 (5.5%) 2 (5.3%) 19 (5.8%)

Total 133 29 127 38 327

Table 2.2B Sugar Pucker Frequency (Percent) For Syn Bases in the Ribosome only

A C G U Total

C3'-endo 18 (25.4%) 1 (8.3%) 19 (29.2%) 6 (40%) 44 (27.0%)

C4'-exo 3 (4.2%) 0 5 (7.7%) 0 8 (4.9%)

O4'-endo 2 (2.8%) 0 0 0 2 (1.2%)

C1'-exo 1 (1.4%) 0 3 (5.6%) 0 4 (2.5%)

C2'-endo 30 (42.3%) 9 (75.0%) 22 (33.8%) 6 (40%) 67 (41.1%)

C3'-exo 12 (16.9%) 1 (8.3%) 11 (16.9%) 1 (6.7%) 25 (15.3%)

C4'-endo 0 0 1 (1.5%) 0 1 (0.6%)

O4'-exo 0 0 0 0 0

C1'-endo 0 0 0 0 0

C2'-exo 5 (7.0%) 1 (8.3%) 4 (6.2%) 2 (13.3%) 12 (7.4%)

Total 71 12 65 15 163

Table 2.2. Sugar pucker frequency by base type. (A.) These data include the ribosome. While C

is most rarely syn, it can incorporate all but two sugar puckers. G is the most versatile syn base,

able to take all but one sugar pucker. C2’-endo is the most common sugar pucker for all bases.

(B.) The ribosome only. C3’-endo is the second most common sugar pucker in all cases.

Page 28: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

Figure 2.3. Bar graph correlating angle with sugar pucker for syn G. C2’-endo and C3’-endo are the most common sugar puckers and have

the largest range of possible values. C4’-exo is consistently strongly syn, while C3’-exo is typically weakly syn.

20

Weak syn

Strong syn

Page 29: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

21

Nearest neighbor and intermolecular interactions

In order to determine if RNA sequence had any effect on the ability of a base to

adopt the syn conformation, the nearest neighbor of each syn base was recorded, and the

information content of nearest neighbors was calculated (Figure 2.4). The Shannon

uncertainty principle5 is used to calculate information content for a single nucleobase in a

given position. The information content is a measure of sequence consistency across

similar structures. This information content is in the range of 0-2 bits, with 0 being no

certainty and 2 being absolute certainty. The information content is calculated by the

following equation:

4

1

2logi

ii PPH , where H is the information content in bits, Pi is

the probability of a certain base, and summed across all four bases.

For example, in a sample size of 40 bases, if the base was always A, the

information content is 2 bits (Pi = 1). If A occurs 20 times and G occurs 20 times, the

information content is 1 bit. If A, U, C, and G are observed 10 times each at that

position, the information content is 0 bits (Pi = ¼ for each base).

The information content for nearest neighbors of all syn bases was <0.25 (Figure

2.4), with one exception. The 5’ neighbor of U, where C was observed as the 5’ neighbor

of syn U only once out of the 37 syn U’s studied, gives an information content of 0.43. U

was the most common 5’ neighbor for syn A and C, while A was the most common 5’

neighbor for syn G and U. The information content of syn G’s nearest neighbors are the

least significant, both less than 0.1. Therefore, sequence does not play an appreciable role

in determining the identity or position of a syn base.

Page 30: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

22

Next, we analyzed the nature of stacking interactions. Across all unique syn bases

in the database, 74% participate in stacking, with 82% of purines and 40% of pyrimidines

involved (Table 2.3). A (87%) was found to stack slightly more often than G (78%) in

the RNA structures analyzed. Of all stacking interactions observed, 75% are classified

by MC-Annotate as non-adjacent stacking, meaning that they take place between non-

neighboring nucleotides and thus are purely tertiary interactions. This striking finding

agrees with the functional data shown below, which indicates that syn bases are used by

the RNA molecule to form functionally important tertiary structure. The low percentage

of adjacent stacks can be attributed to unsuitable orientation of the -system of the bases

when at only one of the bases is syn.

57% of all bases (62% of all purines and 33% of all pyrimidines) were observed

to take part in hydrogen bonding. In terms of base pairing location, 65% of all

0.09 0.23 0.22 0.13

0.09 0.01 0.43 0.09

Figure 2.4. 5’ and 3’ nearest neighbors of syn bases. The syn base is shown in the center, and

height of the letters on each side indicate percent frequency. The number below the 5’ and 3’

neighbors are the information content as calculated by the Shannon uncertainty principle.

Page 31: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

23

Table 2.3: Stacking and base pairing interactions of individual bases. Anti bases participate in

mostly adjacent stacks, while syn bases participate in mostly non-adjacent stacks. Pyrimidines are

less likely to stack when in the syn conformation.

interactions were found to comprise tertiary structure interactions, consistent with our

hypothesis that syn bases are important components for RNA’s tertiary architecture.

There were no significant trends with regards to purine base pair type.

2.4 Analysis of syn bases by category of functional RNA

Aptamers and riboswitches (work by Joshua Sokoloski)

Syn bases are plentiful within both in vitro selected RNA aptamers and natural

RNA riboswitches. 70% of unique aptamer structures in the PDB (21 of 30) have at least

one syn base, with 50% of these aptamers having a syn base playing a functional role. Syn

bases are found in all riboswitch structures listed in the PDB, although there are only six

at present: purine (A and G), lysine, M-box, SAM, TPP (prokaryotic and eukaryotic), and

FMN riboswitches.

Of all syn bases in RNA aptamers, 76% play some functional role via direct

ligand interaction or tertiary structure formation. 55% of the syn bases are found in the

binding pocket, with 70% of this subset (38% of the total syn) directly hydrogen bonding

or stacking to the ligand. Weak and strong syn bases have differing functional roles. In

riboswitches, 64% of all syn bases contribute to function, but only 24% are in the binding

Table 2.3: Stacking and Base Pairing Interactions

Stacking/Anti Nonadjacent Stacking/Anti Base Pairing/Anti

A 87% (91/105) 79% (72/91) 62% (65/105)

G 78% (72/92) 74% (53/72) 63% (58/92)

C 50% (11/22) 64% (7/11) 45% (10/22)

U 31% (8/26) 75% (6/8) 23% (6/26)

Purines 83% (163/197)/91% 77% (125/163)/33% 62% (123/197)/61%

Pyrimidines 40% (19/48)/76% 68% (13/19)/16% 33% (16/48)/80%

Total 74% (182/245) 76% (138/182) 57% (139/245)

Page 32: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

24

pocket and only 18% directly interact with the target ligand. The remaining functional

syn bases in riboswitches are involved in tertiary interactions removed from the aptameric

domain. It should be noted that the sample size for riboswitches (8 molecules) are

necessarily smaller than in vitro selected aptamers (30 molecules).

Next, the syn bases’ positions in the RNA were examined. Figure 2.5 displays an

illustrative example of roles of syn bases in aptamers and riboswitches. Syn bases in

aptamers tend to be clustered in the binding pockets and make direct contacts to the

ligands, emphasizing their functional importance. For the citrulline aptamer (Figure

2.5A), three of the eight bases that bind the ligand through hydrogen bonding contacts are

the syn nucleotides G29, G30, and G35. In the malachite green aptamer (Figure 2.5B),

the interactions with the ligand are through stacking interactions where the ligand is

stacked between a GC base pair and a base quadruple. Syn bases G29 and A31 make half

of the base quadruple, with G29 directly stacking to malachite green, while syn G24

stacks to the ligand from the side of the binding pocket. In the ATP aptamer (Figure

2.5C), syn bases play a prominent role, with one-third of the binding pocket being syn

(A9, A12, and G30). Note here that syn bases also appear in non-functional aspects of

the structure such as U23 and G25 in the tetraloop at the bottom of the structure. While

most aptamers do use syn bases in their binding motifs, some do not have any syn

conformations among their nucleotides, for example, the theophylline and caffeine

aptamers.

Page 33: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

25

Riboswitches contain both aptameric and signal transduction domains, so in these

structures, non-binding roles for syn bases might occur. The purine and lysine

riboswitches have syn bases in both the binding pockets and involved with long-range

tertiary interactions that are crucial to the signaling domain. In the guanine riboswitch

(Figure 2.5D), A23 is used in forming the binding pocket and A65 is directly involved in

a loop/loop interaction removed from the binding pocket that is important in forming the

global fold of the riboswitch. The lysine riboswitch (Figure 2.5E) contains 7 syn bases,

four of which (G8, G9, C10, and G77) are clustered in the binding pocket.

C B

Figure 2.5. Examples of syn base locations in RNA aptamers and riboswitches. A. Citrulline

aptamer (1KOD). B. Malachite Green Aptamer (1Q8N). C. ATP Aptamer (1RAW). D. Guanine

Riboswitch (1U8D). E. Lysine Riboswitch (3D0U). Ligands are in red, and space-filled. Syn

bases are shown as blue sticks.

A

D E

B C

Page 34: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

26

Ribozymes (work by Stephanie Reigh)

The analysis of syn bases in ribozymes is a bit more of a challenge than for RNA

aptamers, because functional relevance cannot usually be determined by simple hydrogen

bonding or stacking interactions. Determining the relevance of syn bases in pre-cleaved

ribozymes requires biochemical studies to interrogate sites for functional relevance. As a

result, ribozymes that have been previously investigated biochemically are the most

applicable to this study.

Typically, ribozymes do not hydrogen bond to a ligand. A notable exception is the

glucosamine-6-phosphate (glmS) ribozyme (Figure 2.6). The cleavage site is shown in

cyan and is indicated by an arrow (Figure 2.6 inset). This ribozyme appears in the 5’-

untranslated region (UTR) of the gene that codes for the glucosamine synthetase enzyme

and is found in archaea and bacteria.6 When glucosamine-6-phosphate (Glc6P) is in

excess, it acts as a ligand and binds to the 5’-UTR. Ligand binding causes the RNA to self-

cleave, silencing gene expression. G1 is syn and hydrogen bonds to Glc6P, as does the

scissile phosphate. A35 takes the syn conformation to stack with syn G1, stabilizing the

interaction.

The leadzyme, discussed in detail in Chapter 1, is an ideal case study for

functional relevance of syn bases. By incorporating 8BrG at G24, the highest rate of

cleavage was obtained,7 consistent with the computational structure (Figure 2.7). The

NMR and x-ray structures both also contain syn Gs, at position 7 and 8, respectively.

When G24 is syn, it is less than 4 Å away from both Pb2+

metal ions (shown in red) in the

structure. A25 and G26 also take the syn conformation.

Page 35: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

27

Figure 2.7. G24 of the leadzyme, MC-Sym structure. Syn bases are shown in blue. Red spheres

are Pb2+

. Structural analysis of the leadzyme has determined that when G24 is syn, the leadzyme

cleavage reaction has the highest kcat.

G1

Figure 2.6. G1 of glmS hydrogen bonding to glucosamine-6-phosphate (shown in red). PDB ID:

3b4b. GlmS is a self-cleaving riboswitch that controls the GlcN6P biosynthetic pathway. The

glmS cleaves when GlcN6P concentrations are high, which turns off the pathway. The scissile

phosphate is indicated by an arrow and cyan coloring. All syn bases are blue. G1 is a syn base

that hydrogen bonds to the substrate. A35 is a syn base that stacks on G1.

G24

A25 G26

A35

GlcN6P

Page 36: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

28

The self-splicing Group I intron (Figure 2.8) has a syn G at G206, which is ΩG

(the terminal nucleotide of the intron).8 The scissile phosphate is shown in cyan and is

indicated by an arrow (Figure 2.8 inset). The syn base density in the Group I intron is

quite low, only 4/219 nucleotides (1.8%). It is remarkable then that two of these syn

bases are near the splice site (A205 and G206). A205 likely takes the syn conformation

to stabilize G206 in its catalytically active state through stacking interactions. The Group

II intron has the highest percentage of syn bases of any ribozyme yet recorded, 6.3%.

One syn base is close to the catalytic triad,9 and several syn bases cluster in an interesting

helix motif. The relevance of these syn bases has not yet been biochemically analazyed.

The hepatitis delta virus (HDV) is a self-cleaving ribozyme, and only one base is

syn, G25 (Figure 2.9). Mutation of that base to an A reduces enzyme activity ~3000-

fold.10

Even with the low frequency of syn bases in this molecule (1%), mutation of the

syn base causes a devastating effect on kinetic rate.

The hairpin ribozyme is found in viruses and also self-cleaves (Figure 2.9).11

A38

adopts the syn conformation, and attempts to mutate this residue showed that other bases

took the anti conformation, which disrupted local structure.12

Ferré-D’Amaré states that

G1 of the molecule is syn, but this residue does not appear as such in MC-Annotate

analysis. Upon manually measuring the angle for this base, the angle is 102.0o. This

angle is just outside of the IUPAC definition of a syn base, and has characteristics more

similar to a strong syn base than an anti base. In the future, if syn bases are shown to be

increasingly relevant, the definition of syn may need to be reexamined.

Page 37: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

29

G206

Figure 2.8 G206 of the self-splicing Group I intron. PDB ID: 1u6b. G206 is syn and ΩG for this

structure (the site of cleavage). A205 takes the syn conformation to stack on G206.

Figure 2.9. G25 of the HDV ribozyme (left, PDB ID 1vc6), A38 of the hairpin ribozyme (right,

PDB ID 1m5k). Mutation G25 in HDV drastically reduces catalytic activity. In the hairpin

ribozyme, the N1 imino group draws the A38 base toward the scissile bond and plays a vital role

in substrate positioning.11

A205

U1A

Binding

Protein

U1A

Binding

Protein

U1A

Binding

Protein

A38

G25

Page 38: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

30

2.5 Conclusion

Unlike their anti counterparts, syn bases give rise to a diverse number of sugar

puckers and angles. Even with its Watson-Crick face situated over the ribose sugar,

these bases often participate in hydrogen bonding and stacking, supporting important

tertiary interactions. Syn bases occur with high frequency in functional RNA, and often

cluster in active and binding sites. Aptamers and riboswitches both commonly include

syn bases when ligands are bound. In ribozymes, even if fewer than 4% of bases take the

syn conformation, those syn bases are often functionally important.

Page 39: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

31

References

1. Gendron, P.; Lemieux, S.; Major, F., Quantitative analysis of nucleic acid three-dimensional

structures. J. Mol. Biol., 2001, 308, 919-36.

2. IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN). Abbreviations and

symbols for the description of conformations of polynucleotide chains. Recommendations

1982. Eur. J. Biochem., 1983, 131, 9-15.

3. Hocquet, A.; Leulliot, N.; Ghomi, M., Ground-State Properties of Nucleic Acid Constituents

Studied by Density Functional Calculations. 3. Role of Sugar Puckering and Base Orientation

on the Energetics and Geometry of 2'-Deoxyribonucleosides and Ribonucleosides. J. Phys.

Chem. B, 2000, 104, 9.

4. Murthy, V. L.; Srinivasan, R.; Draper, D. E.; Rose, G. D., A complete conformational map

for RNA. J. Mol. Biol., 1999, 291, 313-27.

5. Shannon, C. E., A Mathematical Theory of Communication. Bell System Tech. J., 1948, 27,

379-423, 623-656.

6. Klein, D. J.; Been, M. D.; Ferre-D'Amare, A. R., Essential role of an active-site guanine in

glmS ribozyme catalysis. J. Am. Chem. Soc., 2007, 129, 14858-9.

7. Yajima, R.; Proctor, D. J.; Kierzek, R.; Kierzek, E.; Bevilacqua, P. C., A conformationally

restricted guanosine analog reveals the catalytic relevance of three structures of an RNA

enzyme. Chem. Biol., 2007, 14, 23-30.

8. Adams, P. L.; Stahley, M. R.; Kosek, A. B.; Wang, J.; Strobel, S. A., Crystal structure of a

self-splicing group I intron with both exons. Nature, 2004, 430, 45-50.

9. Toor, N.; Keating, K. S.; Taylor, S. D.; Pyle, A. M., Crystal structure of a self-spliced group

II intron. Science, 2008, 320, 77-82.

10. Sefcikova, J.; Krasovska, M. V.; Sponer, J.; Walter, N. G., The genomic HDV ribozyme

utilizes a previously unnoticed U-turn motif to accomplish fast site-specific catalysis. Nucleic

Acids Res., 2007, 35, 1933-46.

11. Rupert, P. B.; Ferre-D'Amare, A. R., Crystal structure of a hairpin ribozyme-inhibitor

complex with implications for catalysis. Nature, 2001, 410, 780-6.

12. Spitale, R. C.; Volpini, R.; Heller, M. G.; Krucinska, J.; Cristalli, G.; Wedekind, J. E.,

Identification of an imino group indispensable for cleavage by a small ribozyme. J. Am.

Chem. Soc., 2009, 131, 6093-5.

Page 40: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

32

Chapter 3

Towards NAME: Incorporation of 8-Bromopurines into Functional RNA During

Transcription

3.1 CRNs and RNA structure/function relationships

The leadzyme was an excellent model system for the initial interrogation of the

importance of syn bases in functional RNA. In Chapter 1, the three possible structures of

the leadzyme generated by different techniques were shown (Figure 1.6). The crystal

structure had a syn G at position 9, the NMR structure at position 7, and the MC-Sym

(computational) structure at position 24.1 The systematic insertion of 8-bromoguanosine

(8BrG) into three synthetic RNA constructs at each of these sites revealed that, when G24

was syn, cleavage rates 30-fold faster than wild-type were obtained.

This leadzyme experiment hints at the difficulties of misfolding in functional

RNA, even in ribozymes as small as the leadzyme, only 30 nt long. Large

ribonucleoprotein (RNP) complexes such as the ribosome have evolved to use proteins to

reinforce native structure. Using synthetic RNA for the leadzyme study was possible

because of its small size and known structures. The goal, however, is to be able to

determine at which sites the incorporation of conformationally restricted nucleotides

(CRNs) can enhance or reveal new function either in larger RNA, where synthesis is not

possible, or in functional RNA that have no available structure.

Page 41: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

33

3.2 Incorporation of modified nucleotides in RNA

Conformationally restricted nucleotides such as 8BrG, which was used in the

leadzyme to enhance function, fall under the general heading of modified nucleotides.

Incorporation of modified nucleotides is an effective technique for probing relevant sites

in functional RNA. Strobel and coworkers have performed extensive studies on the

Tetrahymena Group I ribozyme through incorporation of modified nucleotides2 as well as

other ribozymes. The method he developed is called nucleotide analogue interference

mapping, or NAIM. Incorporation of modified nucleotides, in studies performed by the

Strobel lab, reveals specific sites that are important through interfering with the native

state. As an example, inosine (Figure 3.1) is guanine analog missing the 2-amino group.

This modification interferes with the Watson-Crick hydrogen bonding face and weakens

secondary structure.

Looking for sites of interference is useful because it potentially reveals which

bases are significant for function. The difficulty with this method is that it does not

reveal specifically why any single base is necessary. Using inosine as an example,

removal of the C2 amine may cause inhibition by destabilizing strong G-C bonds,

interfering with favorable electrostatics, or disfavoring wobble base pairing. Additional

Figure 3.1. Removal of the C2 amine in guanosine (left) converts the nucleotide to inosine (right).

Page 42: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

34

investigations to reveal what specific interaction was altered that caused the inhibition

would need to be performed, and determining that could potentially be a difficult task.

Rather than investigate inhibition, we have instead chosen to study ways to

enhance function through incorporation of nucleotide analogs, specifically 8-

bromopurine triphosphates. Precedent for this is our ability to use 8BrG to favor

population of the hairpin state over the duplex state in a YNMG hairpin (Figure 1.5) and

to drive leadzyme catalysis. This method, which follows similar principles to NAIM,

will be called Nucleotide Analogue Mapping of Enhancement, or NAME. Herein, I

focus on random incorporation of 8BrG or 8BrA to analyze enhancement of ribozyme

function.

The first concern for determining if NAME is a viable method is to determine if

these CRNs are able to be incorporated at all. The initial phase of this study involved

transcription of a model system to test for CRN incorporation. Initial work was

performed by Sarah Krahe, undergraduate research assistant in the Bevilacqua lab. She

performed a series of experiments investigating the efficacy of different methods to

perform transcription. She used the malachite green aptamer as a model system and a

hemiduplex DNA template for transcription. She compared use of the standard lab

protocol for RNA transcription to a method suggested by Gopalakrishna et. al.3 The

protocol used by Gopalakrishna and coworkers was designed to incorporate 8-azidoATP

using T7 polymerase and magnesium in solution. Her research concluded that the lab

protocol for transcription worked as efficiently (or better in some cases) than the

Gopalakrishna method for incorporation of these CRNs.4 Most of her experiments,

however, involved doping of CRNs during transcriptions, 10% or less of 8-bromopurine.

Page 43: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

35

At this time, however, the ideal concentration of CRNs for transcription incorporation is

not yet known.

First, we investigated under what conditions 8BrNTPs are incorporated into an

RNA transcript. Transcription conditions for Gopalakrishna and lab experiments are

shown below (Table 2.1). The first experiment was performed on the same DNA

template as Sarah, hemiduplex malachite green aptamer DNA primer. Initially, protocol

2 transcriptions appeared to incorporate 8BrG at 100% about five fold better than

protocol 1 (data not shown).

Table 3.1 Transcription protocols

Protocol 1 Protocol 2

Laboratory Gopalakrishna

400 mM TRIS 400 mM TRIS

250 mM MgCl2 25 mM MgCl2

10 mM spermidine 20 mM spermidine

4 mM NTPs 0.4 mM NTPs

0.1 µg/µL DNA 0.1 µg/µL DNA

2 mM DTT 5 mM BME

0.01% Triton X-100

2 mM Mn

The next experiment utilized an HDV plasmid DNA template and varied the ATP

concentration (Figure 3.2). In this instance, protocol 1 transcription yield (lanes 1 and 2)

was about two-fold better for both 100% ATP and 100% 8BrATP conditions than

protocol 2 (lanes 4 and 5). Additionally, protocol transcription 1 transcription lanes

contained better defined bands and fewer abort sequences. Lanes containing no ATP

(lanes 3 and 6) yielded no full-length transcript, an initial indication that there was no

ATP contaminant in the remaining three NTPs.

Table 3.1. Transcription conditions. All transcriptions were run at 37 oC for two hours except

where noted. Transcriptions were 20µl volume.

Page 44: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

36

Next, we tested the variability between hemiduplex and plasmid transcription

templates. DNA template variable was analyzed using both protocols, 8BrATP and

8BrGTP variable. Both protocols obtained 5- to 10-fold better yields using a plasmid

template (data not shown). In addition, 8BrATP was found to incorporate 5- to 6-fold

better into RNA transcripts than 8BrGTP, most likely because T7 transcription requires G

starts, and incorporating two syn G’s at the beginning of a transcript could be difficult for

the polymerase. Comparing the efficacy of incorporation of 8BrGTP into a plasmid

template was next tested.

To further investigate plasmid transcription, protocol 1 was used to test both

8BrATP and 8BrGTP variables. For protocol 2, 8BrGTP incorporation was further

tested. Other experimental variables were changed to see how transcription conditions

would be affected. Modifications were made to the standard lab transcription in attempts

to improve incorporation of 8BrGTP at 100% concentration. Work by Sarah indicated

Lab Protocol Gopalakrishna

1 2 3 1 2 3

100% ATP + - - + - -

100% 8BrATP - + - - + -

Lane 1 2 3 4 5 6

Figure 3.2. Comparing protocols 1 and 2 for plasmid transcription, ATP variable. This experiment

used T7 polymerase and a two hour incubation period. Protocol 1 yields fewer aborts and

comparable levels of incorporation of 8BrATP.

Full Length

Page 45: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

37

that incubation at 30 oC as opposed to 37

oC could give better CRN incorporation.

Manganese was incorporated as a transcription variable since protocol 2 cited its use as a

contributing factor for the polymerase to be more flexible when incorporating a bulky

group at the 8 position of a purine. The spermidine concentration was cut in half to test if

this would make the polymerase more permissive. Finally, all of the experimental

conditions were attempted for two-hour and four-hour incubation trials (Figure 3.3).

Protocol 1 Protocol 2 Protocol 1

2 hrs 4 hrs 2 hrs 4 hrs 2 hrs 4 hrs 2 hrs 4 hrs ATP + - - + - - + + + + + + + + + + + + + + + + + + 8BrATP - + - - + - - - - - - - - - - - - - - - - - - - GTP + + + + + + + - - + - - + - - + - - + - - + - - 8BrGTP - - - - - - - + - - + - - + - - + - - + - - + - Spec. - - - - - - - - - - - - - - - - - - 3 M ½ 3 M ½

Lane 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Figure 3.3. Comparing multiple transcription variables simultaneously. The Spec. row delinates

any special treatment beyond the protocol 1 conditions. The first six lanes are standard lab

transcription, variable ATP concentration. The next six lanes are standard lab transcription,

variable GTP concentration. For protocol 2 transcriptions, only the GTP variable was analyzed.

The last six lanes are the protocol 1 transcription variations, at 30oC (3), containing 2 mM Mn

2+

(M), and half the concentration of spermidine (½). Running the experiment for four hours did not

seem to improve yields in any case.

Full length

Page 46: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

38

Protocol 1 transcriptions containing 100% GTP (lane 7 and lane 10) appear to be

inconsistent across this gel. This phenomenon was observed at least twice both before

and after the running of this experiment. One possible explanation is that when the

reaction is mixed in such a manner that all of the components are added except the GTP,

and the GTP is added just prior to transcription, erratic G quartet formation could occur.

No further investigation was made into these occurrences. No variable (time,

temperature, spermidine, or manganese, lanes 8, 11, 19-24) appeared to improve the

incorporation of 8BrGTP at 100% concentration (less than 3% yield in all cases).

8BrATP, however, incorporates well and at reasonable levels (around 20% yield, lanes 2

and 5).

Next, TLC was performed to verify that these 8BrNTPs are being successfully

incorporated, and the bands are not arising from NTP impurities in the reagent. TLC was

performed on the purchased 8BrNTPS, and both 8BrATP and 8BrGTP only gave one

band, which is good evidence for reagent purity. The bands for 8BrATP was distinct

from the band for ATP, and the band for 8BrGTP was distinct from the band for GTP.

Summary

For initial investigation of 8-bromopurine triphosphate incorporation into RNA,

some key findings were made. First, it is surprising that these 8BrNTPs can be

incorporated at all because of the syn conformation they take. Transcription reactions

containing only three of four NTPs do not yield full-length transcript, and 8BrNTPs are

found to be pure by TLC, so impurities are not causing full-length bands in the 8Br

transcriptions. Also promising is that they incorporate with reasonable yield. Second,

plasmid transcriptions give quantifiably improved yields after two hours when compared

Page 47: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

39

to a hemiduplex template. This finding holds true for transcriptions with and without

CRNs. Next, 8BrATP is easier to incorporate by a factor of 5 when compared with

8BrGTP. This incorporation difference has the potential to be a factor in some

transcriptions, but when attempting to dope in the CRN at a lower frequency, the

difference in transcription efficacy should not be a problem. A larger ratio of 8BrGTP to

GTP can make up for the difficulty of incorporation when attempting to dope in the CRN.

Lastly, two hours is sufficient to give full extension of plasmid transcription. Doubling

the transcription time does not grant any increase in yield at this scale.

3.3 Future directions: Detecting modified nucleotides in enhanced RNA

The next phase of this project will determine at what concentrations 8BrATP

should be incorporated to achieve random incorporation of about one CRN per

transcribed RNA. To obtain this information, two main experimental routes can be

taken: reverse transcription or phosphorothioate chemistry. To prepare the RNA for both

methods, the initial reaction and purification steps are the same. After transcription, a

ribozyme is placed in catalytic conditions and permitted to react. Using the leadzyme as

an example, lead is added to the isolated transcription product. This reaction mixture is

run on a gel, where the uncleaved transcript separates from the cleavage products. The

cleaved RNA is isolated and purified. The purified RNA is next analyzed by one of the

two main experimental routes.

Reverse transcription (RT) has the potential to simplify the experimental

procedure for the analysis of cleaved RNA. After the reacted ribozyme of interest is

purified, 32

P-labeled DNA primer is annealed to the RNA. The RNA is then reverse

transcribed, and the products are run on a sequencing gel and compared to dideoxy

Page 48: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

40

sequencing lanes. In theory, the reverse transcriptase is unable to read the Watson-Crick

face of an 8BrNTP and will release the RNA when it reaches such a base. Any site

where CRN incorporation enhances function should yield a band on a RT sequencing gel.

This method could reveal all sites at which syn base incorporation causes enhancement of

function.

RT has the potential to be simpler detection method because it involves fewer

experimental steps. Also, RT does not involve any special reagents beyond what can be

purchased, and all materials are readily available in the laboratory. It is not clear,

however, whether RT will give stops at the brominated bases. First, while the reverse

transcriptase would need to be able to fit the CRN in its binding pocket, it has already

been demonstrated that the T7 polymerase can accommodate the extra bromine at the 8

position. Second, while the CRN will have the anti conformation disfavored, syn bases

can still participate in hydrogen bonding, and the CRNs may not be strongly syn. Third,

the reverse transcriptase may not read the Watson-Crick face of the base; the enzyme

may work by base shape, like DNA polymerase.5 The reverse transcriptase may be able

to determine the identity of a base, even in the syn position, by the base shape rather than

its hydrogen bonding face.

If reverse transcriptase reads through the CRNs, phosphorothioate method, which

has been used successfully in the past, will be attempted. The Strobel research group

popularized the use of a phosphorothioate with NAIM. His studies demonstrated that by

incorporating nucleotide analogues and isolating nonreactive ribozyme species, sites

where these analogues interfere with structure and function can be analyzed. The

phosphorothioate functionality (Figure 3.4), when incorporated into an RNA backbone,

Page 49: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

41

can be cleaved with iodine. The iodine cleavage products are run on a gel, and the site of

phosphorothioate incorporation is determined by fragment length. The first step of using

these phosphorothioate species for transcription is to synthesize them. The 8BrNTPs are

not commercially available as an alpha-thiotriphosphate.

Once the thiotriphosphates are synthesized, they then need to be incorporated into

RNA during transcription at a rate of one per RNA. Once these conditions are found, the

ribozyme will be placed in cleavage conditions, just as for the RT procedure. The

cleaved product is then isolated and purified, and submitted to iodine cleavage.

When the NAME experimental details are finalized, the last phase will be to

choose model systems, to prove that the method works, and then to test it in unknown

RNA. The two chosen model systems will be designed to work by the two schemes laid

out in Chapter 1 (Figure 1.3). The leadzyme is an ideal model system for scheme 1,

stabilization of the native state. When inserted at random, incorporation of 8BrG at G24

should enhance leadzyme function more than incorporation of 8BrG at other sites. The

hepatitis delta virus (HDV) ribozyme could be used for scheme 2, destabilization of

misfolded states. The HDV -30/99 construct has a misfold that slows enzyme kinetics.6

This misfolded state can be disfavored by sequestering the -30/-1 in a hairpin by adding

nucleotides to the end of the RNA transcript. Incorporation of CRNs into the -30/ -1

region of the ribozyme should destabilize misfolds that arise from alternate pairings.

Figure 3.4. An alpha-thiotriphosphate (left) and a phosphorothioate incorporated into an RNA

backbone (right).

Page 50: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

42

Finally, choosing a ribozyme with unknown structure/function relationships will be the

ultimate test of this methodology. Systematic CRN incorporation has already been an

effective strategy to learn more about ribozyme structure. Random CRN incorporation

will be the next step at revealing what is hidden in ribozymes and RNPs, and another step

towards the RNA world.

Page 51: SYN BASES: THEIR PREVALENCE, RELEVANCE, AND UTILITY IN

43

References

1. Yajima, R.; Proctor, D. J.; Kierzek, R.; Kierzek, E.; Bevilacqua, P. C., A conformationally

restricted guanosine analog reveals the catalytic relevance of three structures of an RNA

enzyme. Chem. Biol., 2007, 14, 23-30.

2. Strobel, S. A., Ribozyme chemogenetics. Biopolymers, 1998, 48, 65-81.

3. Gopalakrishna, S.; Gusti, V.; Nair, S.; Sahar, S.; Gaur, R. K., Template-dependent

incorporation of 8-N3AMP into RNA with bacteriophage T7 RNA polymerase. RNA, 2004,

10, 1820-30.

4. Krahe, S., Thermodynamics of binding of cognate and noncognate ligands to an RNA

aptamer, and enhancement of specificity through incorporation of modified nucleotides.

(unpublished), 2008, 1-61

5. Morales, J. C.; Kool, E. T., Efficient replication between non-hydrogen-bonded nucleoside

shape analogs. Nat. Struct. Biol., 1998, 5, 950-4.

6. Brown, T. S.; Chadalavada, D. M.; Bevilacqua, P. C., Design of a highly reactive HDV

ribozyme sequence uncovers facilitation of RNA folding by alternative pairings and

physiological ionic strength. J. Mol. Biol., 2004, 341, 695-712.