bio2rdf presentation at combine 2012
DESCRIPTION
Bio2RDF's namespace SPARQL endpointTRANSCRIPT
![Page 1: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/1.jpg)
Bio2RDF's namespace SPARQL endpoint
François BelleauCentre de Biologie Computationnelle
du CRCHUQ
![Page 2: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/2.jpg)
You know them?
How can we help to navigate in the huge Bioinformatics databases cloud ?
![Page 3: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/3.jpg)
2005 BioPAX.gif next to Semantic Webimage vision of Tim Berner Lee
![Page 4: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/4.jpg)
Linked Data in 2007
![Page 5: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/5.jpg)
Bio2RDF contribution in 2009
![Page 6: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/6.jpg)
2011 Linked Data cloud
![Page 7: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/7.jpg)
Databases of databases names
● PathGuide● Bioformatics.ca Links Directory● Annual NAR Database issue● Go, Uniprot, Genbank cross-reference list● LSRN initiative● MIRIAM EBI project● BioPAX dataprovider community● Bio2RDF Linked Data space
![Page 8: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/8.jpg)
Two interesting questions
● Which namespace are the most popular for identifying database ?
● How far is the BioPAX community to adopt
MIRIAM new namespace standard ?
![Page 9: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/9.jpg)
Which namespaces are the most popular to identify a database ?
![Page 10: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/10.jpg)
Namespaces collection used by the BioPAX data provider community.
![Page 11: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/11.jpg)
How far is the BioPAX community to adopt MIRIAM new namespace standard ?
![Page 12: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/12.jpg)
To answer a complex question we first need to build the database that will potentially answer it: a semantic mashup.
How we did this ?
![Page 13: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/13.jpg)
A mashup, in web development, is a web page, or web application, that uses and combines data, presentation or functionality from two or more sources to create new services. The term implies easy, fast integration, frequently using open Application programming interfaces (API) and data sources to produce enriched results that were not necessarily the original reason for producing the raw source data.
http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)
![Page 14: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/14.jpg)
Building a mashup is a lot easier when using Semantic Web technologies like RDF and SPARQL design for data interoperability.
![Page 15: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/15.jpg)
A three steps method
● Get the data form the data provider and transform it into RDF, we use Talend open source Eclipse base ETL software.
● Load the data in a triplestore many software are available (Virtuoso, Sesame, Jena, store, Mulgara, etc.) to load your mashup
● Explore the new dataset using specialised user interface (RelFinder, Virtuoso facet browser)
● Design your SPARQL query and get the answer
![Page 16: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/16.jpg)
Data provider xref resource
● http://www.ebi.ac.uk/miriam/main/ (XML format)
● Bio2RDF DNS zone description file (text)
● http://lsrn.org/ (RDF/XML)
● http://www.geneontology.org/doc/GO.xrf_abbs (key/value format)
● http://www.uniprot.org/docs/dbxref (key/value format)
● http://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/ (HTML)
● 12 BioPAX providers (Reactome, Biomodels, Biocyc, Panther, INOH, etc)
![Page 17: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/17.jpg)
Lesson #1
Produce RDF triples with a profesionnal ETL tool.
![Page 19: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/19.jpg)
Talend workflows to convert HTML Genbank page to triples and MIRAM XML
![Page 20: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/20.jpg)
Lesson #2
Publish with a SPARQL endpoint.(to get a free 5 stars cup)
![Page 21: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/21.jpg)
Load RDF triples into a triplestore(we use Openlink Virtuoso)
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/
![Page 23: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/23.jpg)
Discover entity nameand browse the triplestore
![Page 24: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/24.jpg)
Lesson #3
Consume as you like.
HTTP GET to obtain RDF from URI,SPARQL endpoint,
SOAP services returning RDF, semantic web new software...
![Page 25: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/25.jpg)
The needed SPARQL queryto draw the previous graphusing ManyEyes service
http://namespace.bio2rdf.org/sparql
![Page 26: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/26.jpg)
Use a SOAP service
http://namespace.bio2rdf.org/bio2rdf/services.wsdl
![Page 27: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/27.jpg)
Discover your relations graphicly with RelFinder
http://www.visualdataweb.org/relfinder.php
![Page 28: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/28.jpg)
Conclusion
● Building a mashup is easy with the actual software, we still need the RDF data.
● One SPARQL query in the proper triplestore (Bio2RDF's namespace mashup) could answer our two initial questions.
● Why not consider publish your own SPARQL endpoint to make semantic hacker's life easier ?
![Page 29: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/29.jpg)
Acknowledgements
● Bio2RDF is a community project available at http://bio2rdf.org
● The community can be joined at
https://groups.google.com/forum/?fromgroups#!forum/bio2rdf
● This work was done under the supervision of Dr Arnaud Droit, assistant
professor and director of the Centre de Biologie Computationnelle du
CRCHUQ at Laval University, where a mirror of Bio2RDF is hosted.
● Michel Dumontier, from the Dumontier Lab at Carleton University, is also
hosting Bio2RDF server and actually leads the project
● Thanks to all the people member of the Bio2RDF community, and especially
Marc-Alexandre Nolin and Peter Ansell, initial developers.
![Page 30: Bio2RDF presentation at Combine 2012](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bc9de34a7959906e8b45ce/html5/thumbnails/30.jpg)
Come in Montreal July 2013 with your SPARQL endpoint an get a FREE cup!
http://www.unbsj.ca/sase/csas/data/semantic-trilogy-2013/