the bytes of biological data artemis g. hatzigeorgiou professor of bioinformatics department of...
TRANSCRIPT
![Page 1: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/1.jpg)
The Bytes of biological Data
Artemis G. Hatzigeorgiou
Professor of Bioinformatics
Department of Electrical and Computer Engineering University of Thessaly
Hellenic Institute Pasteur
“Athena” Research Center
![Page 2: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/2.jpg)
What is Bioinformatics?
• Bioinformatics is generally defined as the analysis, prediction, modeling and storage of biological data with the help of computers
![Page 3: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/3.jpg)
![Page 4: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/4.jpg)
![Page 5: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/5.jpg)
![Page 6: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/6.jpg)
![Page 7: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/7.jpg)
Next Generation Sequencing
![Page 8: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/8.jpg)
![Page 9: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/9.jpg)
COSTS
90%
10%
![Page 10: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/10.jpg)
![Page 11: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/11.jpg)
The central dogma
![Page 12: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/12.jpg)
What are microRNAs (miRNAs)?
Gene B
Transcription
DNA
RNA
Translation
PROTEIN
miRNAs are about 22 nt long RNAs.
They post-transcriptionally regulate protein coding gene expression
![Page 13: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/13.jpg)
![Page 14: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/14.jpg)
MicroRNAs are involved in …
Development stem cell proliferationDivision Differentiation
regulation of innate & adaptive immunity
apoptosis cell signaling metabolism
human pathologies
Cancer viral infections cardiovascular diseases metabolic disorders neurological pathologies
psychiatric disorders renal disease hepatological conditions
autoimmune diseases gastroenterological conditions
obesity reproductive disorders
musculoskeletal disorders periodontal pathologies
![Page 15: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/15.jpg)
Superlinear Increase of known miRNAs and relevant Research
![Page 16: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/16.jpg)
Active Pathway Visualization
![Page 17: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/17.jpg)
Citation:WangD,YanK-K,SisuC,ChengC,RozowskyJ,MeyersonW,etal.(2015)Loregic:AMethodtoCharacterizetheCooperativeLogicofRegulatoryFactors.PLoSComputBiol11(4):e1004132.doi:10.1371/journal.pcbi.1004132
![Page 18: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/18.jpg)
Location of miRNAs
miR miRpromoter
Pol2
exon exon
miR miRpromoter
Pol2 70%
30%
![Page 19: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/19.jpg)
Why are the pri-miRNA genes not annotated ?
Fast degradation in the nucleus
Megraw, M., Baev, V., Rusinov, V., Jensen, S.T., Kalantidis, K., Hatzigeorgiou, A.G. MicroRNA promoter element discovery in Arabidopsis (2006) RNA, 12 (9), pp. 1612-1619.
![Page 20: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/20.jpg)
Recognition of Transcription Start Sites
For pri- microRNA genes
• Weight matrices of Transcription Factors• Chip-Seq data of Pol II occupancy • Chip-Seq data of histone modifications (H3K4me3) • Cap Analysis of Gene Expression (CAGE)
![Page 21: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/21.jpg)
ChIP Sequencing Visualization
H3K4me3
Pol2
Drawback: wide range of predictions
![Page 22: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/22.jpg)
Experimental identification of miRNA TSS’s
Drosha null/conditional-null (DroshaLacZ/e4COIN) mouse model has been generated using the conditional by inversion (COIN) methodology from Aris Economides @ REGENERON Pharmaceuticals
Economides, A.N. et al. Conditionals by inversion provide a universal method for the generation of conditional alleles. Proceedings of the National Academy of Sciences Aug 20;110(34):E3179-88 (2013).
![Page 23: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/23.jpg)
Mir17hg
Mir92-1
Mir19b-1
Mir20a
Mir19aMir18
Mir17
GSM973235 WT mESCs 180M reads
Drosha -/- mESCs with 27M reads
Norm
alized
read
cou
nt
()
RNA-seq coverage over the Mir17hg lncRNA locus
Drosha +/+ mESCs with 19M reads
8,856 bp
RNA-seq read depth is essential!
![Page 24: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/24.jpg)
…but ( deep RNA seq is ) not enough
miRNAsputative TSS
RNA-seq coverage
Which one is correct?
![Page 25: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/25.jpg)
ChIP-seq information can effectively reduce putative TSS’s
miRNAs putative TSS
RNA-seq coverage
H3K4me3
Pol2
TF footprints
![Page 26: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/26.jpg)
Algorithm - First step: identify candidate TSS’s
miRNA
coding
Apply a sliding window around miRNAs
mm10
Filter the candidate transcription start sites
putative TSS
mm10
Raw RNA-seq reads
Map reads on the reference genomes
mm10
Reads tend to cluster over the expressed genomic regions
mm10
![Page 27: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/27.jpg)
An algorithm than can learn from examples: machine learning Here we used Support Vector Machines:A supervised machine learning approach.
Training with:
• positive examples (protein coding TSS)
• negative examples (random intergenic locations, flanking positions)
Algorithm - second step: Training of SVMs
![Page 28: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/28.jpg)
Algorithm overview
First step
Second step
Final step
![Page 29: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/29.jpg)
Comparison between microTSS and available algorithmsP
recis
ion Marson et al
S-Peaker
PROmiRNA
microTSS
Distance threshold
Algorithms’ Precision and Sensitivity at 1kbp distance
threshold from validated TSSs in mESC
mESCs (N=47)
Sensitivity
Precision
Marson et al 54% (20/37)
64.5% (20/31)
PROmiRNA 78.7% (37/47)
25.4% (95/373)
S-Peaker 76.5% (36/47)
18.8% (77/409)
microTSS 93.6% (44/47)
100% (44/44)
• No prediction filtering based on distance• Predictions located less than 1,000 bp from the validated TSS are
considered True Positives and the rest are considered False Positives.
• Precision = TP / (TP+FP) • Sensitivity = Correct Predictions / Total Correct
![Page 30: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/30.jpg)
![Page 31: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/31.jpg)
Software on microRNA.gr
![Page 32: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/32.jpg)
Maragkakis M, Vergoulis T, Alexiou P, Reczko M et al. DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association. Nucleic Acids Research, 2011.
• miRNA target predictions (microT)
• miRNA validated targets (TarBase)
• miRNA genomics (miRGen)
• miRNA experimental supported targets on protein coding genes (TarBase)
• miRNA experimental supported targets on Long Non Coding genes (LincBase)
• miRNA genomics (miRGen)
• KEGG pathways analysis (mirPath)
• miRNA targets gene enrichment analysis (mirExTra)
• miRNA to disease associations
• automatic bibliographic searches
• miRNA naming history analysis
• extended connectivity to online databases
Primary data
Meta analysis
Other projects of DIANA lab on microrna.gr
![Page 33: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/33.jpg)
Database of experimentally supported targets: DIANA-TarBase
• Initially released in 2006– The first database to catalog published experimentally
validates miRNA:gene interactions • With more than 500,000 entries, the largest experimentally
validated repository with miRNA:gene interactions• Last update DIANA-TarBase v7 http://www.microrna.gr/tarbase
S. Vlachos, M. D. Paraskevopoulou, D. Karagkouni, G. Georgakilas, T. Vergoulis, I. Kanellos, I-L. Anastasopoulos, S. Maniou, K. Karathanou, D. Kalfakakou, A. Fevgas, T. Dalamagas and A. G. Hatzigeorgiou. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucl. Acids Res. (2014)
![Page 34: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/34.jpg)
Semi – Automatic Curation Pipeline• Automatic Detection of microRNA related articles• Formation of XML-based efficient tree-like structures• Detection of microRNA mentions • Detection of gene mentions • Detection miRNA-gene-interaction triplets• Text Scoring• Meta-Data insertion and mark-up• Score-based ranking and search capabilities
![Page 35: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/35.jpg)
Growth of interactions per method
Evaluation in Poster # 66
![Page 37: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/37.jpg)
Integration in ENSEMBL, the European Browser for Genomes in EBI
![Page 38: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/38.jpg)
Long Non Coding RNAs
LncBase http://www.microrna.gr/LncBase is the largest available repository of miRNA LNC RNA interactions
• The Experimental Module contains more than 5,000 interactions between 2,958 lncRNAs and 120 miRNAs.
• The Prediction Module contains detailed information for more than 10 million interactions, between 56,097 lncRNAs and 3,078 miRNAs.
Integration into RNAcentral ( EBI )
Paraskevopoulou, M.D., Georgakilas, G., Kostoulas, N., Reczko, M., Maragkakis, M., Dalamagas, T.M., Hatzigeorgiou, A.G. DIANA-LncBase: Experimentally verified and computationally predicted microRNA targets on long non-coding RNAs (2013) Nucleic Acids Research, 41 (D1), pp. D239-D245.
![Page 39: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/39.jpg)
![Page 40: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/40.jpg)
miRBase
• Interconnects also entries with external resources:
![Page 41: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/41.jpg)
DIANA-Tools Visit us @ www.microrna.gr!
More than 130,000 visits per year, based on Google Analytics!
Integration of microT & TarBase in miRBase
First release
![Page 42: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/42.jpg)
![Page 43: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/43.jpg)
![Page 44: The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly](https://reader036.vdocuments.net/reader036/viewer/2022062321/56649e025503460f94aecd73/html5/thumbnails/44.jpg)
Discussion
Check the citations of databases / webservers before publishing For example could be a question added to reviewers : Have the researcher cited properly the data used ?
Are the data used for training – testing available ?Can the data be reproduced ? Availability of databases through time – diachronic data Credibility for diachronic databases/web services
Funding: Project “TOM” that is implemented under the "ARISTEIA" Action of the "OPERATIONAL PROGRAMME EDUCATION AND LIFELONG LEARNING" and is co-funded by the European Social Fund (ESF) and National Resources.