17.02.2005
Fast and effective prediction of miRNA targets
Marc RehmsmeierCeBiTec, Bielefeld University, GermanyJunior Research Group Bioinformatics of Regulation
Small interfering RNAs versus small temporal RNAs
Hannon. Nature. 418:244-251, 2002.
miRNA/target duplexes
Grosshans and Slack. The Journal of Cell Biology, 156(1):17-21, 2002.
A direct approach
Given a miRNA and a potential target: What are the energetically most favourable binding sites?
Calculation of multiple mfe secondary structure duplexes
The language of RNA duplexes
hybrid = nil ><< tt (region,region) ||| unpaired_left_top |||closed ... h
unpaired_left_top = ult <<< tt (base,empty) ~~~ unpaired_left_top ||| unpaired_left_bot
... h
unpaired_left_bot = ulb <<< tt (empty,base) ~~~ unpaired_left_bot ||| edangle ... h
edangle = eds <<< tt (base, base) ~~~ closed |||edt <<< tt (base,emptybase) ~~~ closed ||| edb <<< tt (emptybase,base) ~~~ closed ... h
closed = stacking_region ||| bulge_top ||| bulge_bottom |||internal_loop ||| end_loop ... h
stacking_region = sr <<< basepair ~~~ closed
bulge_top = (bt <<< basepair ~~~ tt (uregion, empty)) `topbound` closed
bulge_bottom = (bb <<< basepair ~~~ tt (empty, uregion)) `botbound` closed
internal_loop = (il <<< basepair ~~~ tt (uregion,uregion)) `symbound` closed
end_loop = el <<< basepair ~~~ tt (region,region)
The language of RNA duplexes
Dynamic Programming recurrences
Time/memory complexity: linear in target length
let-7/lin-41 binding sites
position: 688, mfe: -28.0 kcal/mol
position: 737, mfe: -29.0 kcal/mol
Requirements
For prediction of miRNA targets in large databases we need:
• A fast program
• Good statistics
Length normalisation of minimum free energies
)mnlog(een
p-values of individual binding sites
Poisson statistics of multiple binding sites
Probability of k binding sites:
with
For small p-values:
The probability of at least k binding sites:
exp
!k]kN[P
k
]N[E
p,p]N[E
1
01
k
i]iN[P]kN[P
Comparative analysis of orthologous targets
Multi-species p-values
2p
1p
3p
Poisson p-values:
3313322 })p,...,p(max{]pP,pP,pP[P 11
multi-species p-value:
General case: k species
A dependence problem
We should see a p-value as often as it says (blue curve), but we don‘t (red curve).
let-7b/NME4 (human/mouse) binding sites
-GGCTCAAGCTGCCCTTACCACCCCATCCCCCACGCAGGACCAACTACCTCCGTCAGCAAGAACCCAAGCCCACATCCAAACCTGCCTGTCCCAAACCAC
GGGCTTGCACTGCCTTCTGCACTTCAGGTCT-ACCCATGACCTACTACCTCTGTCAACAAGAAGTCAAGCCCCCATGC---TTCCCATGTCCCCAAAC--
**** ***** * *** ** * ** ** **** ******** **** ****** ******* *** * * ****** ** *
TTACTTCCCTGTTCACCTCTGCCCCACCCCAGCCCAGAGGAGTTTGAGCCACCAACTTCAGTGCCTTTCTGTACCCCAAGCCAGCACAAGATTGGACCAA
-CACTCCCTACTCCCGCTCTACCCAACTCCAGCCCAGGGGAGTCTAAGCCTCAACTCTATGTGCCTTTTTGTATCCTAAGTCAATACAATATTGGACCAT
*** ** * * **** *** ** ********* ***** * **** * * * ******** **** ** *** ** **** *********
TCCTTTTTGCACCAAAGTGCCGGACAACCTTTGTGGTGGGGGGGGGTCTTCACATTATCATAACCTCTCCTCTAAAGGGGAGGCATTAAAATTCACTGTG
GTCCTTGTGTACAAAAGTGCCAGACAACCTTTG--------GGGCATTGTCA-AAGGTGACTTCACCTGCCTCAAAGGAGAGACATTAAAATTT--TATG
* ** ** ** ******** *********** *** * *** * * * * ** * ***** *** ********** * **
CCCAGCACATGGGTGGTACACTAATTATGACTTCCCCCAGCTCTGAGGTAGAAATGACGCCTTTATGCAAGTTGTAAGGAGTTGAACAGTAAAGAGGAAG
CTTAAAAT--------------------------------------------------------------------------------------------
* * *
5.0e-05Multi-species p-value with k = 1.1:
1.5e-08Multi-species p-value with k = 2:
k = 1.1 is the effective k
Effective number of orthologous targets
21 )xy(x
minargk
'kF)y,x('k
eff
kkeff 1 })p,p(max{]pP,pP[P effk11 2122
Requirements
For prediction of miRNA targets in large databases we need:
• A fast program
• Good statistics
True and false positives and negatives
Classify a
s Positiv
es
Classify a
s Negativ
es
TP
FP
TN
FN Positives
Negatives
FNTPTP Sens
TP
FP
TN
FN
FPTPFP Sel
1
Sensitivity and specificity
p-values control specificity
Spec
FNTPTP Sens
TP
FP
TN
FN
FPTPFP Sel
1Spec
RNAhybrid
Target prediction workflowtarget
db miRNA registry
individual p-values
multi-species p-values
Poisson p-values
bantam
#sites
target gene E-value Dm Dp Ag
CG13906 0.000141369 2 1 1
CG3629 0.029351532 2 2 0
CG17136 0.047489474 2 0 1
CG5123 0.048580874 2 2 0
CG13761 0.120263377 0 2 2
CG11624 0.605310610 0 3 0
CG1142 0.677123716 0 0 1
CG13333 0.714171923 2 0 0
Prediction of Drosophila miRNA targets
• 78 miRNAs
• 28,645 3‘UTRs (1/3 from D. mel, 1/3 from D. pseu, 1/3 from A. gamb)
Bantam hits
#sites Ag
# sites Dp
#sites Dm
E-valuetarget
0220.049Wrinkled (Hid)
0220.029Distal-less
1120.00014nervous fingers 1
miR-7 hits
3320.000095CG8394
0220.00014Twin of m4
0110.0083E(spl) region transcript m3
0210.094E(spl) region transcript m
0110.21CG7342
1110.27CG10444
0210.30Him
0110.86CG11132
#sites Ag
# sites Dp
#sites Dm
E-valuetarget
0110.87Arginine methyltransferase 1
miR-2 hits
2 2 00.054sickle
1 1 00.00951 1 00.111 1 00.00061reaper
1 1 00.0451 2 00.0711 1 00.014grim
#sitesE-value#sitesE-value#sitesE-valuetarget
miR-2cmiR-2bmiR-2a
plus a number of others
RNAhybrid functionality
length normalisation
Poisson statistics
web serverseed/loop constraints
miRNA specific statistics
effective k
comparative analysis
multiple binding sites
RNAhybrid
miRNA target selection
surprise
miRNA target selection
rank based
p-values E-values
user guidance
p-values indicate not only biochemical possibility, but also biological function.
Acknowledgements
• Peter Steffen, Robert Giegerich, Jan Krüger
• Matthias Höchsmann
• Alexander Stark, Julius Brennecke, Stephen M. Cohen
• Sven Rahmann
• Gregor Obernosterer
• Robert Heinen
• Leonie Ringrose
References
Rehmsmeier M, Steffen P, Höchsmann M and Giegerich R. Fast and effective prediction of microRNA/target duplexes. RNA, 10:1507-1517, 2004.
bibiserv.techfak.uni-bielefeld.de/rnahybrid