flexible and universal multiple target nucleic acid

1
DoktoratsKolleg RNA Biology Abstract Introducon to RNA design Movaon: Due to the close structure to funcon relaonship, the availability of good structure predicon methods, and energy models, RNA is perfectly suited to design molecules with predened properes. Currently available RNA design tools implement very specialized usecases which cannot be easily adapted to new applicaons. Oen, complicated sampling and opmizaon methods were developed to suit a specic design goal. Results: We developed RNAblueprint, a C++ library implemenng a graph coloring approach to uniformly sample sequences compable to structural and sequence constraints from the typically huge soluon space. Fair sampling from the soluon space makes opmizaon runs much more performant and raises the probability to nd beer soluons. Scripng interfaces allow to easily adapt exisng code to new scenarios which makes the whole design process very universal and exible. We implemented novel design approaches such as a mulstable thermoswitch or a sRNA mediated translaon regulaon system to show the advantages of our soware. G A G C C U 50 U Sequences AGAUGACUACUGACAUC Experimental validation: in-vitro in-vivo Structural constraints Sequence constraints Landscape properties .((((.(....).)))) ANNNNNCGAAAGNNNNN -1.8 Kcal/mol Energy Congurations E E A Function In silico characterisation To be able to de novo design RNA molecules it was necessary to develop methods to solve the inverse folding problem and approaches for an in silico characterizaon. A design task can be generally split into three components: (1) generate valid sequences compable to sequence and structural constraints, (2) formulate an opmizaon problem where the sampled sequences are opmized towards a opmal score calculated by the objecve funcon and (3) use in silico characterizaon methods such as ltering and clustering with respect to problem specic features not included in the objecve. A Frequency Frequency of the solution found B Sample size relative to size of solution space [%] Unique solutions [%] Fair sampling of RNA sequences Figure 1: Dierences between fair and unfair stochasc sampling for a small design example. A: The histogram shows how frequent unique soluons were found when sampling sequences using fair and unfair sampling. B: Even when sampling much more soluons than available in the soluon space (230%), we sll get only 50% of all possible soluons with unfair sampling, while we are able to sample about 90% with the fair method. The developed soware [1] implements a graph theorecal approach [2] to be able to handle mulple structural constraints. In addion, it takes any sequence constraint in IUPAC annotaon as input and generates sequences compable to all constraints. RNAblueprint oers funcons to get properes of the soluon space and current search spaces, such as the number of soluons. This feature can, for example, be used for mutaonal studies. For ecient opmizaons we guarantee to sample every soluon with the same probability from the whole soluon space as shown in Figure 1. Mul�‐state thermoswitch To show the advantages of our soware, namely the exibility, universality and eciency, we implemented serveral designs including a mulstate thermoswitch. This RNA molecule is able to fold into three disnct states at specic temperatures. The objecve funcon not only includes a term to favour the given structures at their temperatures, but also to counter select for the structures at all other temperatures. Objeve Funcon 10 20 30 10 20 30 10 20 30 Probability Specic heat [kcal (mol K) -1 ] 0 20 40 60 80 100 0 1 2 3 4 1 heat capacity (((((((((((((....))))))))))))) 5°C (((((.....)))))(((((.....))))) 10°C (((((.....)))))............... 37°C probability of the current MFE GAUCUGUGUGGGGUCGAUUUUGUGUGGGUU Temperature [Celsius] References Figure 3: Heat capacity curve and thermodynamic probabilies of the target structures evidence the sucessful design. The molecule folds into the desired states at the given temperatures with high probabilies and shows disnct transions between the states. The RNA completely unfolds above 70 degrees to then exhibit the open chain formaon as minimum free energy structure. Objeve Funcon sRNA regulated gene expression We furthermore implemented a regulatory 5'UTR that retulates expression of its downstream gene by responding to the presence of a small RNA. We therefore came up with a novel objecve funcon which not only contains the accessibility of the RBS and the sRNA binding site, but also the concentraon of the formed sRNA/5'UTR complex for ecient binding. An addional term minimizes the crosstalk of the 5'UTR with the downstream reporter gene. mRNA Complex RBS + sRNA X RBS G G G A A G C A C C A G U C G G U U U A U C A C A A G C C G C U G G U G C U U A G A G U A A A G A U A A G G A G U A A U G A A A U G C G T A A G G G C G A A G A G C T T T T T A C C G G T G T T G T G C C T A T T C T C G T A G A G T T A G A T G G C G A C G T T A A T 1 10 20 30 40 50 60 70 80 90 100 110 120 130 132 G G G U C C U A G C G U C A U U C A C G C G G A C C C U U C A U U U C A U U A C U C C U U A U C U U U A U U C A C C U A G C A U A A C C C C U U G G G G C C U C U A A A C G G G U C U U G A G G G G U U U U U U G 1 10 20 30 40 50 60 70 80 90 100 105 Figure 4: Graphical representaon of the sRNAmRNA pair design result. Top le: dot plot and centroid fold structure of the designed 5'UTR including some nucleodes of the reporter gene. The 5' stem loop structure is very likely (large lled squares) whereas the remaining depicted base pairs (small lled squares) indicate high exibility. The mRNA binding site (orange) is unstructured as intended. Boom right: same for sRNA. Again, desired stems dominate the structural ensemble, whereas the binding site (orange) is accessible. The dot plot above the diagonal depicts possible base pairs of the sRNA:mRNA complex. The upper right rectangle shows the inter molecular base pairs whereas the small triangles next to it indicate intra molecular base pairs. The corresponding minimum free energy structure of the complex is depicted in the lower lecorner (outside the dot plot) surrounded by the individual terms of the objecve funcon. We provide a soware soluon in form of the RNAblueprint library which makes it possible to uniformly sample RNA sequences compable to structural and sequence constraints. With our framework it is possible to incorporate almost any soware like NUPACK [4] or ViennaRNA [5] in the calculaon of the objecve funcon and it is now feasible to explore a much broader range of objecves. We showed examples on how to use the developed soware to implement a design method using de novo objecve funcons that incorporate terms which were not available in other exisng design frameworks. We are working hard to set up an extendible webservice to not only make our design tools easily available, but also to provide a range of analysis tools for in silico vericaon and characterizaon of the generated designs or any other RNA of interest. Conclusion hp://nibiru.tbi.univie.ac.at/blueprint/

Upload: others

Post on 18-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flexible and universal multiple target nucleic acid

DoktoratsKollegRNA Biology

Abstract

Introduc�on to RNA design

Mo�va�on: Due to the close structure to func�on rela�onship, the availability of good structure predic�on methods, and energy models, RNA is perfectly suited to design molecules with predefined proper�es. Currently available RNA design

tools implement very specialized use‐cases which cannot be easily adapted to new applica�ons. O�en, complicated sampling and op�miza�on methods were developed to suit a specific design goal.Results: We developed RNAblueprint, a C++ library implemen�ng a graph coloring approach to uniformly sample sequences compa�ble to structural and sequence constraints from the typically huge solu�on space. Fair sampling from the solu�on space makes op�miza�on runs much more performant and raises the probability to find be�er solu�ons. Scrip�ng interfaces allow to easily adapt exis�ng code to new scenarios which makes the whole design process very universal and flexible. We implemented novel design approaches such as a mul�stable thermoswitch or a sRNA mediated transla�on regula�on system to show the advantages of our so�ware.

A

G

A

G

C U

U

A

AA

C

U

C

U

50U

SequencesAGAUGACUACUGACAUC

Experimental validation:◾ in-vitro◾ in-vivo

Structural constraintsSequence constraintsLandscape properties.((((.(....).))))ANNNNNCGAAAGNNNNN-1.8 Kcal/mol

Energ

y

Configurations

𝚫E

EA

Function

In silicocharacterisation

To be able to de novo design RNA molecules it was necessary to develop methods to solve the inverse folding problem and approaches for an in silico characteriza�on. A design task can be generally split into three components: (1) generate valid sequences compa�ble to sequence and structural constraints, (2) formulate an op�miza�on problem where the sampled sequences are op�mized towards a op�mal score calculated by the objec�ve func�on and (3) use in silico characteriza�on methods such as filtering and clustering with respect to problem specific features not included in the objec�ve.

A

Frequency

Frequency of the solution foundB

Sample size relative to size of solution space [%]

Uniq

ue s

olu

tions

[%]

Fair sampling of RNA sequences

Figure 1: Differences between fair and unfair stochas�c sampling for a small design example. A: The histogram shows how frequent unique solu�ons were found when sampling sequences using fair and unfair sampling. B: Even when sampling much more solu�ons than available in the solu�on

space (∼230%), we s�ll get only 50% of all possible solu�ons with unfair sampling, while we are able to sample about 90% with the fair method.

The developed so�ware [1] implements a graph theore�cal approach [2] to be able to handle mul�ple structural constraints. In addi�on, it takes any sequence constraint in IUPAC annota�on as input and generates sequences compa�ble to all constraints.RNAblueprint offers func�ons to get proper�es of the solu�on space and current search spaces, such as the number of solu�ons. This feature can, for example, be used for muta�onal studies. For efficient op�miza�ons we guarantee to sample every solu�on with the same probability from the whole solu�on space as shown in Figure 1.

Mul�‐state thermoswitchTo show the advantages of our so�ware, namely theflexibility, universality and efficiency, weimplemented serveral designs including a mul�statethermoswitch. This RNA molecule is able to fold intothree dis�nct states at specific temperatures. Theobjec�ve func�on not only includes a term to favourthe given structures at their temperatures, but alsoto counter select for the structures at all othertemperatures.

Obje�ve Func�on

10

20

30

10

20

30

10

20

30

Pro

bab

ility

Sp

ecific

heat

[kca

l ⨉ (

mol ⨉ K

)-1]

0 20 40 60 80 100

0

1

2

3

4

1

heat capacity(((((((((((((....))))))))))))) 5°C(((((.....)))))(((((.....))))) 10°C(((((.....)))))............... 37°Cprobability of the current MFE

GAUCUGUGUGGGGUCGAUUUUGUGUGGGUU

Temperature [Celsius]

References[1] Stefan Hammer, Birgit Tschiatschek, Christoph Flamm, Ivo L. Hofacker and Sven Findeiß. RNAblueprint: Flexibleand universal mul�ple target nucleic acid sequence design. Bioinforma�cs, submi�ed 2016.[2] Chris�an Höner zu Siederdissen, Stefan Hammer, Ingrid Abfalter, Ivo L. Hofacker, Christoph Flamm, and Peter F.Stadler. Computa�onal design of RNAs with complex energy landscapes. Biopolymers, 99(12):1124–1136, 2013.[3] Christoph Flamm, I. L. Hofacker, S. Maurer‐Stroh, P. F. Stadler, and M. Zehl. Design of mul�stable RNA molecules.RNA, 7(2):254 265, February 2001.[4] Joseph N. Zadeh, Conrad D. Steenberg, Jus�n S. Bois, Brian R. Wolfe, Marshall B. Pierce, Asif R. Khan, Robert M.Dirks, and Niles A. Pierce. NUPACK: Analysis and design of nucleic acid systems. Journal ofComputa�onal Chemistry, 32(1):170–173, 2011.[5] Ronny Lorenz, Stephan H. Bernhart, Chris�an Höner zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F.Stadler, and Ivo L. Hofacker. ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1):26, November 2011.

Figure 3: Heat capacity curve and thermodynamic probabili�es of the target structures evidence the sucessfuldesign. The molecule folds into the desired states at the given temperatures with high probabili�es and showsdis�nct transi�ons between the states. The RNA completely unfolds above 70 degrees to then exhibit the openchain forma�on as minimum free energy structure.

Obje�ve Func�on

sRNA regulated gene expression

We furthermore implemented a regulatory5'UTR that retulates expression of itsdownstream gene by responding to thepresence of a small RNA. We therefore cameup with a novel objec�ve func�on which notonly contains the accessibility of the RBS andthe sRNA binding site, but also theconcentra�on of the formed sRNA/5'UTRcomplex for efficient binding. An addi�onalterm minimizes the crosstalk of the 5'UTRwith the downstream reporter gene.

mRNA ComplexRBS

+

sRNA

XRBS

G

G

G

AAGCACCAGUC

GG

UU

UA

U

C

A C

AA

GC

CG C U G G U G C U U

A

G

A

G

U

A

A

A

G

A

U

A

A

G

G

A

G

U

A

A

U

G

A

A

A

U

G

C

GT

AAG

GG

CGA

A

GA G

CT

TT T

T

AC

C

G

G

T

G

T

T

G

T

G

C

C

T

A

T

T

C

T

C

G

T

A

G

A

G

T

T

A

G

A

T

G

G

C

G

A

C

G

T

T

A

A

T

1

10

20

3040

50

60

70

80

90

100

110

120

130

132

G

G

G

U

C

C

U

AG

C

G

U

CA

U

U

CA

C

G

C

G

G

A

C

C

C U U C A U U U C A U U A C U C C U U A U C U U U A U U C A C C U A G C A U A

A

C

C

C

C

U

U

G

G

G

G

C

C

U

C

U

A AA

C

G

G

G

U

C

U

U

G

A

G

G

G

G

U

U U U U U G

1

10

20

30 40 50 60

70

80

90

100105

G

G

G

A

A

G

C

A

C

C

A

GU

C

G

G

U

U

U

A

U

C A

C

A

A

G

C

C

G

C

U

G

G

U

G

C

U

U

A

G

A

G

U

A

A

A

G

A

U

A

A

G

G

A

G

U

A

A

U

G

A

A

A

U

G

C

G

U

A

A

G

GG

CG

A

A

G A

GC U

U

U

U

U

A

C

C

GG

U

G

U

UG

U

G

CC

UA

UU

C

UCG

UA

G A

G

U U

A

G

A

U

GG

C

G

A

C

G

U

U

A

A

U

1

10

20

30

40

50

60

70

80

90

100

110

120130

132

G

G

G

U

C

CU

A

G

C

G

U

C

AU

U

C

A

C

G

C

G

G

A

C

C

C

U

U

C

A

U

U

U

C

A

U

U

A

C

U

C

C

U

U

A

U

C

U

U

U

A

U

U

C

A

C

C

U

A

G

C

A

U

A

A

C

C

C

C

U

U

G

G

G

G

C

C

U C

U

A

AAC

G

G

G

U

C

U

U

G

A

G

G

G

G

U

U

U

U

U

U

G

1

1020

30

40

50

60

70

8090

100

105

[Scomplex]

[mRNA0]=1

P(5'UTRunpaired) = 0.94

P(sRNAunpaired) = 0.91

P(mRNAcontext) = 0.07

Figure 4: Graphical representa�on of the sRNA‐mRNA pair design result. Top le�: dot‐plot and centroid fold structure of the designed 5'UTR including some nucleo�des of the reporter gene. The 5' stem loop structure is very likely (large filled squares) whereas the remaining depicted base pairs (small filled squares) indicate high flexibility. The mRNA binding site (orange) is unstructured as intended. Bo�om right: same for sRNA. Again, desired stems dominate the structural ensemble, whereas the binding site (orange) is accessible. The dot plot above the diagonal depicts possible base pairs of the sRNA:mRNA complex. The upper right rectangle shows the inter molecular base pairs whereas the small triangles next to it indicate intra molecular base pairs. The corresponding minimum free energy structure of the complex is depicted in the lower le� corner (outside the dot plot) surrounded by the individual terms of the objec�ve func�on.

We provide a so�ware solu�on in form of the RNAblueprintlibrary which makes it possible to uniformly sample RNAsequences compa�ble to structural and sequence constraints. With our framework it is possible to incorporate almost anyso�ware like NUPACK [4] or ViennaRNA [5] in the calcula�onof the objec�ve func�on and it is now feasible to explore amuch broader range of objec�ves. We showed examples onhow to use the developed so�ware to implement a designmethod using de novo objec�ve func�ons that incorporateterms which were not available in other exis�ng designframeworks.

We are working hard to set up an extendible web‐service tonot only make our design tools easily available, but also toprovide a range of analysis tools for in silico verifica�on andcharacteriza�on of the generated designs or any other RNA ofinterest.

Conclusion

This work has been supported by the European Commission under the Environment Theme of the 7th Framework Program for Research and Technological Development (Grant agreement number 323987).

h�p://nibiru.tbi.univie.ac.at/blueprint/

Flexible and universal multiple targetnucleic acid sequence designStefan Hammer1,2, Birgit Tschiatschek2, Christoph Flamm1, Ivo L. Hofacker1,2,3 and Sven Findeiß1,2

1 University of Vienna, Department of Theoretical Chemistry, Vienna, A-1090, Austria2 University of Vienna, Bioinformatics and Computational Biology Research Group, Vienna, A-1090, Austria3 University of Copenhagen, Center for Non-coding RNA in Technology and Health, Copenhagen, DK-1870, Denmark