vivo mini-grant: integrating the umls ontology into vivo for linking biomedical scientists
TRANSCRIPT
Stony Brook University
School of Medicine
8/25/2011
1
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedical Scientists
Moises Eisenberg* and Janos Hajagos*
Contributors:
• Erich Bremer*
• Jizu Zhi**
• Tammy DiPrima*
• Ann Gardner***
• Naresh Singh****
• Aniket Divecha****
2
Dept. of Medical Informatics* | OSA**| SOM***|Dept. of Computer Science****
SUNY REACH
3
4
5
6
VIVO has an ontology
7
The Semantic Web starts simple with a RDF triple
Subject ObjectPredicate
8
Builds into a more complex network of interlinked URIs
Source data from NCIt
9
10
The CUI: Concept Unique Identifier
11
UMLS as linked data
• Developed tool to publish large databases into RDFS
• Published 2011AA version of the UMLS and corresponding RxNorm Release
– Public available sources (SRL=0)
– RxNorm is linked to DrugBank
• Includes attributes, relationships, and semantic types from the UMLS
12
UMLS RDF in the wider world
13
Faceted browser view
14
UMLS CUI Alignment Tool
15
16
Algorithm for aligning free text
• Parse free text into component words
• Build phrases of different word length
• Query UMLS if phrase exists
• Sort in descending order of number of words
• Tie criteria based on number of occurrences in different source vocabularies
– Most widely used gets a higher rank
17
UMLS Web Service
• Base address: http://link.informatics.stonybrook.edu
• Sample call: /MeaningLookup/MlServiceServlet?textToProcess=Pediatric%20HIV&format=json
• Response format:– JSON, N-triples, RDF/XML
• Response content:– Best choices and all choices for matching CUIs
18
19
Alignment to the UMLS CUIs
PubMed RDF Conversion• Started with XSLT published in 2008 by Pierre Lindenbaum
• A prototype project linked 2010 PubMed to the internal Health Sciences Library MARC holdings data (>800,000,000 triples)
• Allowed linked data search joining article data with holdings data
• PubMed XSLT updated to 2011 schema with MeSH aligned to the UMLS CUIs
• Current translation generated 1,973,880,813 triples
20
21
PubMed CUI Web Service
• Base address: http://link.informatics.stonybrook.edu
• Sample call: /weaver/pubmed2cuis?pmid=17952453
• Response format: JSON
• Response content: UMLS CUIs with labels
22
Linking subject areas to publications
23
Data facts
• UMLS RDF (2011AA release; English language; SRL=0)– Number of triples: 110,415,427– Number of different sources: 46– Number of CUIs: 2,404,344– Number of AUIs: 3,594,372
• REACH VIVO (Data extracted 8/23/2011)– Number of people: 684– Number of triples: 448,112
• UMLS alignment of subject areas– Number of subject areas: 425– Number of UMLS CUIs generated: 899– Number of distinct UMLS CUIs: 604
• PubMed alignment to REACH– Number of UMLS CUIs generated: 192,450– Number of distinct UMLS CUIs: 11,039– Articles with no MeSH is 1,293 out of 15,975
24
Links
• SPARQL endpoint:– http://link.informatics.stonybrook.edu/sparql/
• CUI alignment tool:– http://link.informatics.stonybrook.edu/MeaningLookup/
• Points to start browsing linked data:– http://link.informatics.stonybrook.edu/umls/– http://link.informatics.stonybrook.edu/umls/SAB
• Open source code developed at SBU:– http://code.google.com/p/py-triple-simple/
• Native Python RDF utility
– http://code.google.com/p/sbu-mi-vivo-tools/• Automated dumping of VIVO sites RDF and alignment to UMLS and PubMed
– http://code.google.com/p/spyder-web/• Faceted browser and lightweight web service for parameterized SPARQL queries
25
Acknowledgements
• Supported through:
VIVO: Enabling National Networking of ScientistsNIH U24 RR029822
• Original interactive CUI alignment tool created by Jakub Pezacki (SBU Class of 2010)
26