flexible and universal multiple target nucleic acid
TRANSCRIPT
DoktoratsKollegRNA Biology
Abstract
Introduc�on to RNA design
Mo�va�on: Due to the close structure to func�on rela�onship, the availability of good structure predic�on methods, and energy models, RNA is perfectly suited to design molecules with predefined proper�es. Currently available RNA design
tools implement very specialized use‐cases which cannot be easily adapted to new applica�ons. O�en, complicated sampling and op�miza�on methods were developed to suit a specific design goal.Results: We developed RNAblueprint, a C++ library implemen�ng a graph coloring approach to uniformly sample sequences compa�ble to structural and sequence constraints from the typically huge solu�on space. Fair sampling from the solu�on space makes op�miza�on runs much more performant and raises the probability to find be�er solu�ons. Scrip�ng interfaces allow to easily adapt exis�ng code to new scenarios which makes the whole design process very universal and flexible. We implemented novel design approaches such as a mul�stable thermoswitch or a sRNA mediated transla�on regula�on system to show the advantages of our so�ware.
A
G
A
G
C U
U
A
AA
C
U
C
U
50U
SequencesAGAUGACUACUGACAUC
Experimental validation:◾ in-vitro◾ in-vivo
Structural constraintsSequence constraintsLandscape properties.((((.(....).))))ANNNNNCGAAAGNNNNN-1.8 Kcal/mol
Energ
y
Configurations
𝚫E
EA
Function
In silicocharacterisation
To be able to de novo design RNA molecules it was necessary to develop methods to solve the inverse folding problem and approaches for an in silico characteriza�on. A design task can be generally split into three components: (1) generate valid sequences compa�ble to sequence and structural constraints, (2) formulate an op�miza�on problem where the sampled sequences are op�mized towards a op�mal score calculated by the objec�ve func�on and (3) use in silico characteriza�on methods such as filtering and clustering with respect to problem specific features not included in the objec�ve.
A
Frequency
Frequency of the solution foundB
Sample size relative to size of solution space [%]
Uniq
ue s
olu
tions
[%]
Fair sampling of RNA sequences
Figure 1: Differences between fair and unfair stochas�c sampling for a small design example. A: The histogram shows how frequent unique solu�ons were found when sampling sequences using fair and unfair sampling. B: Even when sampling much more solu�ons than available in the solu�on
space (∼230%), we s�ll get only 50% of all possible solu�ons with unfair sampling, while we are able to sample about 90% with the fair method.
The developed so�ware [1] implements a graph theore�cal approach [2] to be able to handle mul�ple structural constraints. In addi�on, it takes any sequence constraint in IUPAC annota�on as input and generates sequences compa�ble to all constraints.RNAblueprint offers func�ons to get proper�es of the solu�on space and current search spaces, such as the number of solu�ons. This feature can, for example, be used for muta�onal studies. For efficient op�miza�ons we guarantee to sample every solu�on with the same probability from the whole solu�on space as shown in Figure 1.
Mul�‐state thermoswitchTo show the advantages of our so�ware, namely theflexibility, universality and efficiency, weimplemented serveral designs including a mul�statethermoswitch. This RNA molecule is able to fold intothree dis�nct states at specific temperatures. Theobjec�ve func�on not only includes a term to favourthe given structures at their temperatures, but alsoto counter select for the structures at all othertemperatures.
Obje�ve Func�on
10
20
30
10
20
30
10
20
30
Pro
bab
ility
Sp
ecific
heat
[kca
l ⨉ (
mol ⨉ K
)-1]
0 20 40 60 80 100
0
1
2
3
4
1
heat capacity(((((((((((((....))))))))))))) 5°C(((((.....)))))(((((.....))))) 10°C(((((.....)))))............... 37°Cprobability of the current MFE
GAUCUGUGUGGGGUCGAUUUUGUGUGGGUU
Temperature [Celsius]
References[1] Stefan Hammer, Birgit Tschiatschek, Christoph Flamm, Ivo L. Hofacker and Sven Findeiß. RNAblueprint: Flexibleand universal mul�ple target nucleic acid sequence design. Bioinforma�cs, submi�ed 2016.[2] Chris�an Höner zu Siederdissen, Stefan Hammer, Ingrid Abfalter, Ivo L. Hofacker, Christoph Flamm, and Peter F.Stadler. Computa�onal design of RNAs with complex energy landscapes. Biopolymers, 99(12):1124–1136, 2013.[3] Christoph Flamm, I. L. Hofacker, S. Maurer‐Stroh, P. F. Stadler, and M. Zehl. Design of mul�stable RNA molecules.RNA, 7(2):254 265, February 2001.[4] Joseph N. Zadeh, Conrad D. Steenberg, Jus�n S. Bois, Brian R. Wolfe, Marshall B. Pierce, Asif R. Khan, Robert M.Dirks, and Niles A. Pierce. NUPACK: Analysis and design of nucleic acid systems. Journal ofComputa�onal Chemistry, 32(1):170–173, 2011.[5] Ronny Lorenz, Stephan H. Bernhart, Chris�an Höner zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F.Stadler, and Ivo L. Hofacker. ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1):26, November 2011.
Figure 3: Heat capacity curve and thermodynamic probabili�es of the target structures evidence the sucessfuldesign. The molecule folds into the desired states at the given temperatures with high probabili�es and showsdis�nct transi�ons between the states. The RNA completely unfolds above 70 degrees to then exhibit the openchain forma�on as minimum free energy structure.
Obje�ve Func�on
sRNA regulated gene expression
We furthermore implemented a regulatory5'UTR that retulates expression of itsdownstream gene by responding to thepresence of a small RNA. We therefore cameup with a novel objec�ve func�on which notonly contains the accessibility of the RBS andthe sRNA binding site, but also theconcentra�on of the formed sRNA/5'UTRcomplex for efficient binding. An addi�onalterm minimizes the crosstalk of the 5'UTRwith the downstream reporter gene.
mRNA ComplexRBS
+
sRNA
XRBS
G
G
G
AAGCACCAGUC
GG
UU
UA
U
C
A C
AA
GC
CG C U G G U G C U U
A
G
A
G
U
A
A
A
G
A
U
A
A
G
G
A
G
U
A
A
U
G
A
A
A
U
G
C
GT
AAG
GG
CGA
A
GA G
CT
TT T
T
AC
C
G
G
T
G
T
T
G
T
G
C
C
T
A
T
T
C
T
C
G
T
A
G
A
G
T
T
A
G
A
T
G
G
C
G
A
C
G
T
T
A
A
T
1
10
20
3040
50
60
70
80
90
100
110
120
130
132
G
G
G
U
C
C
U
AG
C
G
U
CA
U
U
CA
C
G
C
G
G
A
C
C
C U U C A U U U C A U U A C U C C U U A U C U U U A U U C A C C U A G C A U A
A
C
C
C
C
U
U
G
G
G
G
C
C
U
C
U
A AA
C
G
G
G
U
C
U
U
G
A
G
G
G
G
U
U U U U U G
1
10
20
30 40 50 60
70
80
90
100105
G
G
G
A
A
G
C
A
C
C
A
GU
C
G
G
U
U
U
A
U
C A
C
A
A
G
C
C
G
C
U
G
G
U
G
C
U
U
A
G
A
G
U
A
A
A
G
A
U
A
A
G
G
A
G
U
A
A
U
G
A
A
A
U
G
C
G
U
A
A
G
GG
CG
A
A
G A
GC U
U
U
U
U
A
C
C
GG
U
G
U
UG
U
G
CC
UA
UU
C
UCG
UA
G A
G
U U
A
G
A
U
GG
C
G
A
C
G
U
U
A
A
U
1
10
20
30
40
50
60
70
80
90
100
110
120130
132
G
G
G
U
C
CU
A
G
C
G
U
C
AU
U
C
A
C
G
C
G
G
A
C
C
C
U
U
C
A
U
U
U
C
A
U
U
A
C
U
C
C
U
U
A
U
C
U
U
U
A
U
U
C
A
C
C
U
A
G
C
A
U
A
A
C
C
C
C
U
U
G
G
G
G
C
C
U C
U
A
AAC
G
G
G
U
C
U
U
G
A
G
G
G
G
U
U
U
U
U
U
G
1
1020
30
40
50
60
70
8090
100
105
[Scomplex]
[mRNA0]=1
P(5'UTRunpaired) = 0.94
P(sRNAunpaired) = 0.91
P(mRNAcontext) = 0.07
Figure 4: Graphical representa�on of the sRNA‐mRNA pair design result. Top le�: dot‐plot and centroid fold structure of the designed 5'UTR including some nucleo�des of the reporter gene. The 5' stem loop structure is very likely (large filled squares) whereas the remaining depicted base pairs (small filled squares) indicate high flexibility. The mRNA binding site (orange) is unstructured as intended. Bo�om right: same for sRNA. Again, desired stems dominate the structural ensemble, whereas the binding site (orange) is accessible. The dot plot above the diagonal depicts possible base pairs of the sRNA:mRNA complex. The upper right rectangle shows the inter molecular base pairs whereas the small triangles next to it indicate intra molecular base pairs. The corresponding minimum free energy structure of the complex is depicted in the lower le� corner (outside the dot plot) surrounded by the individual terms of the objec�ve func�on.
We provide a so�ware solu�on in form of the RNAblueprintlibrary which makes it possible to uniformly sample RNAsequences compa�ble to structural and sequence constraints. With our framework it is possible to incorporate almost anyso�ware like NUPACK [4] or ViennaRNA [5] in the calcula�onof the objec�ve func�on and it is now feasible to explore amuch broader range of objec�ves. We showed examples onhow to use the developed so�ware to implement a designmethod using de novo objec�ve func�ons that incorporateterms which were not available in other exis�ng designframeworks.
We are working hard to set up an extendible web‐service tonot only make our design tools easily available, but also toprovide a range of analysis tools for in silico verifica�on andcharacteriza�on of the generated designs or any other RNA ofinterest.
Conclusion
This work has been supported by the European Commission under the Environment Theme of the 7th Framework Program for Research and Technological Development (Grant agreement number 323987).
h�p://nibiru.tbi.univie.ac.at/blueprint/
Flexible and universal multiple targetnucleic acid sequence designStefan Hammer1,2, Birgit Tschiatschek2, Christoph Flamm1, Ivo L. Hofacker1,2,3 and Sven Findeiß1,2
1 University of Vienna, Department of Theoretical Chemistry, Vienna, A-1090, Austria2 University of Vienna, Bioinformatics and Computational Biology Research Group, Vienna, A-1090, Austria3 University of Copenhagen, Center for Non-coding RNA in Technology and Health, Copenhagen, DK-1870, Denmark