string algorithms computational biology structural …tabio152/wiki.files/ncrna_bio...string...
TRANSCRIPT
יוקלסון-מיכל זיו. פרופ
בג"א, המחלקה למדעי המחשב
http://www.cs.bgu.ac.il/~negevcb/index.php
String Algorithms
Computational Biology
Structural RNAomics
Outline
• RNA and its structure.
• The “RNA Revolution” and principles of regulation by non-coding RNAs
• RNA structure implies function: deciphering the evidence
Outline
• RNA and its structure.
• The “RNA Revolution” and principles of regulation by non-coding RNAs
• RNA structure implies function: deciphering the evidence
What is RNA?
• A biological molecule, composed as a
sequence over 4 types of building blocks
called bases or nucleotides.
• The different base types are denoted by
the letters A, G, C, and U.
•RNA bases A,C,G,U
•Canonical Base Pairs
–A-U
–G-C
–G-U
“wobble” pairing
–Bases can only pair with
one other base.
/
2 Hydrogen Bonds3 Hydrogen Bonds – more stable
What is RNA?
RNA Structure, Dimensions 1- 3 : Folding
RNA Quaternary structure:
microRNA:mRNA
Wang, et al.,
(PlosCB 2010)1
Bicoid mRNA
Dimerization.
Ferrandon et al.,
(EMBO 1997)
RNA Structure, Dimension 4 : Self -Dimerization and RNA-RNA
interactions
Zhang W , Chen S PNAS 2002;99:1931-1936
©2002 by National Academy of Sciences
RNA Structure, Dimensions 5: Folding dymamics
Outline
• RNA and its structure.
• The “RNA Revolution” and principles of regulation by non-coding RNAs
• RNA Structure prediction from sequences
> DNA sequence
AATTCATGAAAATCGTATACTGGTCTGGTACCGG
CAACACTGAGAAAATGGCAGAGCTCATCGCTAAA
GGTATCATCGAATCTGGTAAAGACGTCAACACCA
TCAACGTGTCTGACGTTAACATCGATGAACTGCT
GAACGAAGATATCCTGATCCTGGGTTGCTCTGCC
ATGGGCGATGAAGTTCTCGAGGAAAGCGAATTTG
Gene Function
> Protein sequence
MKIVYWSGTGNTEKMAELIAKGIIES
GKDVNTINVSDVNIDELLNEDILILGC
SAMGDEVLEESEFEPFIEEISTKISG
KKVALFGSYGWGDGKWMRDFEER
MNGYGCVVVETPLIVQNEPDEAEQD
CIEFGKKIANI
The Central Dogma of Molecular Biology
RNADNA PROTEIN
Genome: The digital backbone
of molecular biology
Transcripts: Perform functions
encoded in the genome
The Central Dogma of Molecular Biology
What are RNA and mRNA?
• Traditional role as messenger molecule (mRNA)
• RNA: a polymer of nucleotides A,U,C,G.
CS374 Stanford
RNA
AUUGCCGAUGACGGCAGUGAUGUAGUA
• Traditional role as messenger molecule (mRNA)
• RNA: a polymer of nucleotides A,U,C,G.
CS374 Stanford
RNA
AUUGCCGAUGACGGCAGUGAUGUAGUA
Down Regulation by mRNA Silencing
Up Regulation by mRNA stabilization
CS374 Stanford
RNA
AUUGCCGAUGACGGCAGUGAUGUAGU
binding site
The Central Dogma of
Molecular Biology
Protein
RNA
DNA
transcription
translation
CCTGAGCCAACTATTGATGAA
PEPTIDE
CCUGAGCCAACUAUUGAUGAA
שעתוק
תרגום
18
The Central Dogma of
Molecular Biology
Protein
RNA
DNA
transcription
translation
Non Coding RNA
- RNA molecule that is not translated into a protein
- Have been found to have roles in a great variety of processes
הדוגמה המרכזית של הביולוגיה
DNA
RNA
PROTEINS
DNA
RNA
PROTEINS
Ron Unger – Bar-Ilan University 2009
RNA: the molecule of the year 2002.Couzin J. (2002). Breakthrough of the Year:
Small RNAs Make Big Splash. Science 298, 2296.
20
Andrew Z. Fire Craig C. Mello
The Nobel Prize in Physiology or Medicine 2006
"for their discovery of RNA interference - gene
silencing by double-stranded RNA"
A Recent Example
DNA RNA Protein
21
AUUGCCGAUGACGGCAGUGAUGUAGUA
CCGUCAC
Shutting down a gene by via a hybridization between an mRNA
and a complementary small RNA that prevents it from being
translated into a protein.
Anti-Sense:RNA complementarity yields gene silencing
Interpretation: the light-blue rectangle symbolizes the Ribosome,
the gray cloverleaf represents the tRNA
and the green circle and amino acid
22
AUUGCCGAUGACGGCAGUGAUGUAGUA
CCGUCAC
Shutting down a gene by via a hybridization between an mRNA
and a complementary small RNA that prevents it from being
translated into a protein.
Anti-Sense:RNA complementarity yields gene silencing
23
AUUGCCGAUGACGGCAGUGAUGUAGUA
CCGUCAC
However, the single stranded anti-sense fragments
are unstable, digested by proteins, and seemingly not
very useful as a therapy.
Anti-Sense:RNA complementarity yields gene silencing
RISC
AUUGCCGAUGACGGCAGUGAUGUAGUA
Down Regulation by mRNA degradation
GCUACUG
RISC
AUUGCCGAUGACGGCAGUGAUGUAGUA
binding site
Down Regulation by mRNA degradation
AUUGCCGAUGACGGCAGUGAUGUAGUAGCUACUG
RISC
binding site
Down Regulation by mRNA degradation
AUUGCCGAUGACGCUACUG
RISC
binding site
Down Regulation by mRNA degradation
GCUACUG
RISC
AUUGCCGAUGAC
binding site
Down Regulation by mRNA degradation
GCUACUG
RISC
AUUGCCGAUGAC
binding site
Down Regulation by mRNA degradation
GCUACUG
RISC
AUUGCCGAUGAC
binding site
Down Regulation by mRNA degradation
GCUACUG
RISC
AUUGCCGAUGAC
binding site
Down Regulation by mRNA degradation
RNAסוגי מולקולות ....רשימה חלקית
Ron Unger – Bar-Ilan University 2009
RNAסוגי מולקולות ....רשימה חלקית
Ron Unger – Bar-Ilan University 2009
Long non-
coding RNAs
http://rfam.sanger.ac.uk/
Ron Unger – Bar-Ilan University 2009
?ncRNAבאילו תהליכים משתתפות מולקולות
• Translation (tRNA and rRNA)
• Ribosome maturation and RNA processing (snRNA and snoRNA)
• Splicing (U1, U2, U4, U5)
• Replication (telomerase RNA)
• Gene regulation (miRNA, siRNA)
• Editing (rna editing, e.g. serotenin receptor)
• Protein translocation (SRP RNA)
• Fighting pathogens (vRNA, CRISPR)
• Translation quality control (tmRNA).
Ron Unger – Bar-Ilan University 2009
noncoding RNAבקרה על ידי
הסבר אפשרי לחידת המורכבות שכן מבחינת עולם החלבונים אין הבדל גדול בין תולעים
.ובני אדם
Introns, Intergenic Regions, Repetitive sequences: מהרצף שמבוטא אינו מתורגם לחלבונים95%
:באופן כללי אורך וכמות האינטרונים קשורה במורכבות האורגניזם
100bpמהטרנסקריפטים יש אינטרונים באורך של כ 10-20%באוקריוטים פשוטים ל
500bpמהטרנסקריפטים יש אינטרונים באורך ממוצע של 50%בצמחים ל
500bpגם בנמטודות וזבובים האורך הממוצע הוא
3400bpבבני אדם האורך הממוצע של אינטרונים הוא
ncRNA רבים שוכנים בתוך אינטרונים
RNAהתא מקדיש מנגנונים רבים לטיפול ב
יותר מחצי מהם 456bpקטעי רצף שמורים באורך ממוצע של 400,000בין אדם לכלב ישנם כ
.אינם קשורים לגנים המקודדים לחלבונים
קשורים למנגנוני בקרה שתורמים למורכבות של יצורים ncRNAיתכן ש
Ron Unger – Bar-Ilan University 2009
ncRNAיתרונות הבקרה על ידי
מאפשרים בקרה גמישה ויעילה כמו שבמערכות תקשורת מודרניות ncRNAבקרה על ידי
.קווי הבקרה נפרדים מקווי הנתונים
בקרה כזאת מאפשרת למשל לבצע עידכונים
בזמן אמת ללא צורך ליצר מולקולות חדשות
,בקרה כזו מאפשרת למשל סינכרון של פעילות
למשל
ncRNAכאשר גן חלבוני משועתק מולקולות
הנמצאות באינטרונים שלו יכולות לדווח על יצור
החלבון
Mattick J. Non-coing RNAs: the architects of eukaryotic complexity.
Embo Reports, 21:986-991, 2001.
Ron Unger – Bar-Ilan University 2009
?בגנום ncRNAמדוע קשה לאתר מולקולות
אין לנו כלים נסיוניים טובים לאתר מולקולות כאלו במיוחד כאלה שהן קצרות חיים ונמצאות
.בכמויות קטנות
:מבחינה ביואינפרמטית הבעיה היא שבניגוד לחלבונים בהם ניתן להעזר בסיגנלים כמו
Start and Stop codons ,אין אנו מכירים סיגנלים ', וכו, פרומוטורים, מחזוריות הקודונים
:ncRNAכלליים כאלו לגבי
ניבוי מבנה שניוני: אז מה יש לנו
בין אורגניזמים(של רצף ומבנה)שימור
.סיגנלים ספציפיים למשפחות מסוימת
Ron Unger – Bar-Ilan University 2009
Outline
• RNA and its structure.
• The “RNA Revolution” and principles of regulation by non-coding RNAs
• RNA structure implies function: deciphering the evidence
Why is RNA Structure Interesting?
Accessible RNA binding sites
Binding site accessibility
A motif has to be accessible to binding
Binding site accessibility
A motif has to be accessible to binding
Binding site accessibility
A target-site motif accessible to binding
GCUACUG
RISC
AUUGCCGAUGAC
binding site
Down Regulation by mRNA degradation
RNA Structure Prediction
Challenges ?
Ron Unger – Bar-Ilan University 2009
51A target Gene
Conserved sequence in stem of “hairpin” structure
52A target Gene
Conserved sequence in stem of “hairpin” structure
54
Dicer
Gene
Conserved sequence in stem of “hairpin” structure
A target GeneAn inverted repeat
55
Gene
Conserved sequence in stem of “hairpin” structure
A target GeneAn inverted repeat
56
Gene
Conserved sequence in stem of “hairpin” structure
A target GeneAn inverted repeat
57
Gene
Conserved sequence in stem of “hairpin” structure
A target GeneAn inverted repeat
58
Gene
Conserved sequence in stem of “hairpin” structure
A target GeneAn inverted repeat
59
Gene
Conserved sequence in stem of “hairpin” structure
A target GeneAn inverted repeat
60
Conserved sequence in stem of “hairpin” structure
A target GeneAn inverted repeat
Witness 3: Structural (Co-evolutionary) Conservation –
Hairpin structure adapted for recognition by Drosha/Dicer
Within the hairpin structure, conserved sequence preserves
Complementarity with target site.
Problem 1a
Find these
Problem 1b
How do these fold?
Problem 2a
How to predict these?
Problem 2b
What are the targets
these bind to?
How to Solve Problem 1a
Find these
“GGUAU” “CCGUA”
GGUAU
CCGUA
[Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer.
Structural Cis-Elements: Purine Riboswitch
“GGUAU” “CCGUA”
GGUAU
CCGUA
[Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer.
Structural Cis-Elements: Purine Riboswitch
Three Structural Witnesses for RNA functionality
Witness 1: Structure Stability.
Witness 2: Conserved Structure.
Witness 3 : Conserved Sequence (within its structural context).
AUCCCCGUAUCGAUC
AAAAUCCAUGGGUAC
CCUAGUGAAAGUGUA
UAUACGUGCUCUGAU
UCUUUACUGAGGAGU
CAGUGAACGAACUGA
Witness 1: Stability of Structure
Measurements of Stability:
Max Base Pairs
Minimum Free Energy
Partition Function
Statistical Scores based on SCFGs/Machine Learning
Lactobacillus acidophilus Lactobacillus delbrueckii
G-U U-A
Witness 2: Conserved Structure
Lactobacillus acidophilus Lactobacillus delbrueckii
G-C C-G
Witness 2: Conserved Structure
Lactobacillus acidophilus Lactobacillus delbrueckii
U-A C-G
Witness 2: Conserved Structure
Lactobacillus acidophilus Lactobacillus delbrueckii
CAUCUUUGA CAUCUCUGA
Witness 3: Conserved Sequence(within its structural context)
Sequence and Structure Conservation in the context of
imprinted structural conservation
Lactobacillus acidophilus Lactobacillus delbrueckii
GGUAU
CCGUA
GGUAU
CCGUA
Three Structural Witnesses for RNA functionality
Witness 1: Structure Stability.
Witness 2: Conserved Structure.
Witness 3 : Conserved Sequence (within its structural context).
Three Structural Witnesses for RNA functionality
Witness 1: Structure Stability.
Witness 2: Conserved Structure.
Witness 3 : Conserved Sequence (within its structural context).
AUCCCCGUAUCGAUC
AAAAUCCAUGGGUAC
CCUAGUGAAAGUGUA
UAUACGUGCUCUGAU
UCUUUACUGAGGAGU
CAGUGAACGAACUGA
Witness 1: Stablity of Structure (2D, predicted)
RNA Secondary Structure Prediction: O(N3):
[Nusssinov-Jacobson 1980, Zuker-Stiegler-1981]
MFOLD: http://www.rpi.edu/~zukerm
Vienna RNA Package: http://www.tbi.univie.ac.at/~ivo/RNA
Nussinov Algorithm