the power of graphs to analyze biological data - davy suvee @ graphconnect london 2013
DESCRIPTION
This talk will illustrate the power and flexibility of Graph Databases and Neo4j specifically to help in the overall analysis of biological data sets. Davy will show how to build a visual exploration environment that helps researchers at identifying clusters within various biological data sets, including gene expression and mutation prevalence data. Additionally, he will demo BRAIN (Bio Relations and Intelligence Network), a powerful data exploration platform that combines various scientific data sources (including Pubmed, Swissprot and Drugbank). It uses Neo4J under the cover to both store and enable powerful querying capabilities that provide key insights and deductions.TRANSCRIPT
![Page 1: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/1.jpg)
Grap
hCon
nect
![Page 2: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/2.jpg)
the power of graphs to analyze biological data
![Page 3: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/3.jpg)
about me
who am i ...
Davy Suvee@DSUVEE
➡ big data architect @ datablend - continuum• provide big data and nosql consultancy
• 5 years of hands-on expertise in the pharma/biotech sector
![Page 4: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/4.jpg)
massive data
big data in pharma
full genome sequencing
complex databiological networks
scalable number crunching platform
visual insights-driven platform
graphs!!
![Page 5: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/5.jpg)
outlier detection platform
big data in pharma (2 specific use cases)
neo4j, mongodb/cassandra and gephi
euretos - brainneo4j, mongodb, solr and prefuse
![Page 6: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/6.jpg)
gene expression clustering
★ 4.800 samples★ 27.000 genes
➡ oncology data set:
➡ Question:★ for a particular subset of samples, which genes are co-expressed?
![Page 7: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/7.jpg)
storing gene expressions (mongodb)
{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} , "sample_name" : "122551hp133a21.cel" , "genomics_id" : 122551 , "sample_id" : 343981 , "donor_id" : 143981 , "sample_type" : "Tissue" , "sample_site" : "Ascending colon" , "pathology_category" : "MALIGNANT" , "pathology_morphology" : "Adenocarcinoma" , "pathology_type" : "Primary malignant neoplasm of colon" , "primary_site" : "Colon" , "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} , { "gene" : "X10_at" , "expression" : 3.92335121981739} , { "gene" : "X100_at" , "expression" : 7.81638155662255} , { "gene" : "X1000_at" , "expression" : 5.44318512260619} , … ]}
![Page 8: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/8.jpg)
correlating samples (mongodb/map-reduce)
pearson correlation
x y
43 99
21 65
25 79
42 75
57 87
59 81
0,52
![Page 9: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/9.jpg)
co-expression graph (neo4j)
➡ create a node for each sample➡ if correlation between two samples >= 0.8
create an edge between both nodes
122552
122553
122551
correlated
value : 0,86
![Page 10: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/10.jpg)
co-expression visualisation (gephi)
![Page 11: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/11.jpg)
euretos - brain
➡ pubmed: 23 million biomedical articles• 1300 new ones added every day• google-like search interface
➡ reading an article ...• malaria is transferred by mosquitoes
![Page 12: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/12.jpg)
euretos - brain
authors references
![Page 13: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/13.jpg)
euretos - brain
ooooooh crap ...
![Page 14: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/14.jpg)
euretos - brain
➡ nanopub (nanopub.org)• the smallest unit of publishable information
➡ assertion• subject: malaria• predicate: transferred by• object: mosquito
➡ provenance• how this came to be (meta-data)
![Page 15: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/15.jpg)
euretos - brain➡ unfortunately, malaria is encoded in various ways ...
malaria P22384 AQ879
db1 db2 db3
malaria
![Page 16: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/16.jpg)
euretos - brain
malaria mosquitotransferred by
![Page 17: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/17.jpg)
euretos - brain
➡ brain (http://www.euretos.com/brain)• exploration and analysis platform• millions of concepts/triples/nanopubs• pubmed, uniprot, omim, pubchem, ...
➡ architectural stack• meta-data is stored in mongodb• graph in neo4j• swing interface connecting to rest endpoints
![Page 18: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/18.jpg)
brain
![Page 19: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/19.jpg)
brain
![Page 20: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/20.jpg)
brain
![Page 21: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/21.jpg)
brain
![Page 22: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/22.jpg)
brain
![Page 23: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/23.jpg)
brain
![Page 24: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/24.jpg)
brain
![Page 25: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/25.jpg)
brain
![Page 26: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/26.jpg)
Questions?
![Page 27: The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013](https://reader036.vdocuments.net/reader036/viewer/2022081414/54b6b7a64a7959e55e8b4589/html5/thumbnails/27.jpg)
Follow us
twitter.com/data_blendwww.datablend.be
www.datablend.be [email protected] 0499/05.00.89
datablend - continuum