fast and effective prediction of mirna targets file1 7. 0 2. 2 0 0 5 fast and effective prediction...
TRANSCRIPT
17.02.2005
Fast and effective prediction of miRNA targets
Marc RehmsmeierCeBiTec, Bielefeld University, GermanyJunior Research Group Bioinformatics of Regulation
Small interfering RNAs versus small temporal RNAs
Hannon. Nature. 418:244-251, 2002.
miRNA/target duplexes
Grosshans and Slack. The Journal of Cell Biology, 156(1):17-21, 2002.
A direct approach
Given a miRNA and a potential target: What are the energetically most favourable binding sites?
Calculation of multiple mfe secondary structure duplexes
The language of RNA duplexes
hybrid = nil ><< tt (region,region) ||| unpaired_left_top |||closed ... h
unpaired_left_top = ult <<< tt (base,empty) ~~~ unpaired_left_top ||| unpaired_left_bot
... h
unpaired_left_bot = ulb <<< tt (empty,base) ~~~ unpaired_left_bot ||| edangle ... h
edangle = eds <<< tt (base, base) ~~~ closed |||edt <<< tt (base,emptybase) ~~~ closed ||| edb <<< tt (emptybase,base) ~~~ closed ... h
closed = stacking_region ||| bulge_top ||| bulge_bottom |||internal_loop ||| end_loop ... h
stacking_region = sr <<< basepair ~~~ closed
bulge_top = (bt <<< basepair ~~~ tt (uregion, empty)) `topbound` closed
bulge_bottom = (bb <<< basepair ~~~ tt (empty, uregion)) `botbound` closed
internal_loop = (il <<< basepair ~~~ tt (uregion,uregion)) `symbound` closed
end_loop = el <<< basepair ~~~ tt (region,region)
The language of RNA duplexes
Dynamic Programming recurrences
Time/memory complexity: linear in target length
let-7/lin-41 binding sites
position: 688, mfe: -28.0 kcal/mol
position: 737, mfe: -29.0 kcal/mol
Requirements
For prediction of miRNA targets in large databases we need:
• A fast program
• Good statistics
Length normalisation of minimum free energies
)mnlog(een
p-values of individual binding sites
Poisson statistics of multiple binding sites
Probability of k binding sites:
with
For small p-values:
The probability of at least k binding sites:
exp
!k]kN[P
k
]N[E
p,p]N[E
1
01
k
i]iN[P]kN[P
Comparative analysis of orthologous targets
Multi-species p-values
2p
1p
3p
Poisson p-values:
3313322 })p,...,p(max{]pP,pP,pP[P 11
multi-species p-value:
General case: k species
A dependence problem
We should see a p-value as often as it says (blue curve), but we don‘t (red curve).
let-7b/NME4 (human/mouse) binding sites
-GGCTCAAGCTGCCCTTACCACCCCATCCCCCACGCAGGACCAACTACCTCCGTCAGCAAGAACCCAAGCCCACATCCAAACCTGCCTGTCCCAAACCAC
GGGCTTGCACTGCCTTCTGCACTTCAGGTCT-ACCCATGACCTACTACCTCTGTCAACAAGAAGTCAAGCCCCCATGC---TTCCCATGTCCCCAAAC--
**** ***** * *** ** * ** ** **** ******** **** ****** ******* *** * * ****** ** *
TTACTTCCCTGTTCACCTCTGCCCCACCCCAGCCCAGAGGAGTTTGAGCCACCAACTTCAGTGCCTTTCTGTACCCCAAGCCAGCACAAGATTGGACCAA
-CACTCCCTACTCCCGCTCTACCCAACTCCAGCCCAGGGGAGTCTAAGCCTCAACTCTATGTGCCTTTTTGTATCCTAAGTCAATACAATATTGGACCAT
*** ** * * **** *** ** ********* ***** * **** * * * ******** **** ** *** ** **** *********
TCCTTTTTGCACCAAAGTGCCGGACAACCTTTGTGGTGGGGGGGGGTCTTCACATTATCATAACCTCTCCTCTAAAGGGGAGGCATTAAAATTCACTGTG
GTCCTTGTGTACAAAAGTGCCAGACAACCTTTG--------GGGCATTGTCA-AAGGTGACTTCACCTGCCTCAAAGGAGAGACATTAAAATTT--TATG
* ** ** ** ******** *********** *** * *** * * * * ** * ***** *** ********** * **
CCCAGCACATGGGTGGTACACTAATTATGACTTCCCCCAGCTCTGAGGTAGAAATGACGCCTTTATGCAAGTTGTAAGGAGTTGAACAGTAAAGAGGAAG
CTTAAAAT--------------------------------------------------------------------------------------------
* * *
5.0e-05Multi-species p-value with k = 1.1:
1.5e-08Multi-species p-value with k = 2:
k = 1.1 is the effective k
Effective number of orthologous targets
21 )xy(x
minargk
'kF)y,x('k
eff
kkeff 1 })p,p(max{]pP,pP[P effk11 2122
Requirements
For prediction of miRNA targets in large databases we need:
• A fast program
• Good statistics
True and false positives and negatives
Classify a
s Positiv
es
Classify a
s Negativ
es
TP
FP
TN
FN Positives
Negatives
FNTPTP Sens
TP
FP
TN
FN
FPTPFP Sel
1
Sensitivity and specificity
p-values control specificity
Spec
FNTPTP Sens
TP
FP
TN
FN
FPTPFP Sel
1Spec
RNAhybrid
Target prediction workflowtarget
db miRNA registry
individual p-values
multi-species p-values
Poisson p-values
bantam
#sites
target gene E-value Dm Dp Ag
CG13906 0.000141369 2 1 1
CG3629 0.029351532 2 2 0
CG17136 0.047489474 2 0 1
CG5123 0.048580874 2 2 0
CG13761 0.120263377 0 2 2
CG11624 0.605310610 0 3 0
CG1142 0.677123716 0 0 1
CG13333 0.714171923 2 0 0
Prediction of Drosophila miRNA targets
• 78 miRNAs
• 28,645 3‘UTRs (1/3 from D. mel, 1/3 from D. pseu, 1/3 from A. gamb)
Bantam hits
#sites Ag
# sites Dp
#sites Dm
E-valuetarget
0220.049Wrinkled (Hid)
0220.029Distal-less
1120.00014nervous fingers 1
miR-7 hits
3320.000095CG8394
0220.00014Twin of m4
0110.0083E(spl) region transcript m3
0210.094E(spl) region transcript m
0110.21CG7342
1110.27CG10444
0210.30Him
0110.86CG11132
#sites Ag
# sites Dp
#sites Dm
E-valuetarget
0110.87Arginine methyltransferase 1
miR-2 hits
2 2 00.054sickle
1 1 00.00951 1 00.111 1 00.00061reaper
1 1 00.0451 2 00.0711 1 00.014grim
#sitesE-value#sitesE-value#sitesE-valuetarget
miR-2cmiR-2bmiR-2a
plus a number of others
RNAhybrid functionality
length normalisation
Poisson statistics
web serverseed/loop constraints
miRNA specific statistics
effective k
comparative analysis
multiple binding sites
RNAhybrid
miRNA target selection
surprise
miRNA target selection
rank based
p-values E-values
user guidance
p-values indicate not only biochemical possibility, but also biological function.
Acknowledgements
• Peter Steffen, Robert Giegerich, Jan Krüger
• Matthias Höchsmann
• Alexander Stark, Julius Brennecke, Stephen M. Cohen
• Sven Rahmann
• Gregor Obernosterer
• Robert Heinen
• Leonie Ringrose
References
Rehmsmeier M, Steffen P, Höchsmann M and Giegerich R. Fast and effective prediction of microRNA/target duplexes. RNA, 10:1507-1517, 2004.
bibiserv.techfak.uni-bielefeld.de/rnahybrid