vivo mini-grant: integrating the umls ontology into vivo for linking biomedical scientists

Post on 12-Jul-2015

1.280 Views

Category:

Sports

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stony Brook University

School of Medicine

8/25/2011

1

VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedical Scientists

Moises Eisenberg* and Janos Hajagos*

Contributors:

• Erich Bremer*

• Jizu Zhi**

• Tammy DiPrima*

• Ann Gardner***

• Naresh Singh****

• Aniket Divecha****

2

Dept. of Medical Informatics* | OSA**| SOM***|Dept. of Computer Science****

SUNY REACH

3

4

5

6

VIVO has an ontology

7

The Semantic Web starts simple with a RDF triple

Subject ObjectPredicate

8

Builds into a more complex network of interlinked URIs

Source data from NCIt

9

10

The CUI: Concept Unique Identifier

11

UMLS as linked data

• Developed tool to publish large databases into RDFS

• Published 2011AA version of the UMLS and corresponding RxNorm Release

– Public available sources (SRL=0)

– RxNorm is linked to DrugBank

• Includes attributes, relationships, and semantic types from the UMLS

12

UMLS RDF in the wider world

13

Faceted browser view

14

UMLS CUI Alignment Tool

15

16

Algorithm for aligning free text

• Parse free text into component words

• Build phrases of different word length

• Query UMLS if phrase exists

• Sort in descending order of number of words

• Tie criteria based on number of occurrences in different source vocabularies

– Most widely used gets a higher rank

17

UMLS Web Service

• Base address: http://link.informatics.stonybrook.edu

• Sample call: /MeaningLookup/MlServiceServlet?textToProcess=Pediatric%20HIV&format=json

• Response format:– JSON, N-triples, RDF/XML

• Response content:– Best choices and all choices for matching CUIs

18

19

Alignment to the UMLS CUIs

PubMed RDF Conversion• Started with XSLT published in 2008 by Pierre Lindenbaum

• A prototype project linked 2010 PubMed to the internal Health Sciences Library MARC holdings data (>800,000,000 triples)

• Allowed linked data search joining article data with holdings data

• PubMed XSLT updated to 2011 schema with MeSH aligned to the UMLS CUIs

• Current translation generated 1,973,880,813 triples

20

21

PubMed CUI Web Service

• Base address: http://link.informatics.stonybrook.edu

• Sample call: /weaver/pubmed2cuis?pmid=17952453

• Response format: JSON

• Response content: UMLS CUIs with labels

22

Linking subject areas to publications

23

Data facts

• UMLS RDF (2011AA release; English language; SRL=0)– Number of triples: 110,415,427– Number of different sources: 46– Number of CUIs: 2,404,344– Number of AUIs: 3,594,372

• REACH VIVO (Data extracted 8/23/2011)– Number of people: 684– Number of triples: 448,112

• UMLS alignment of subject areas– Number of subject areas: 425– Number of UMLS CUIs generated: 899– Number of distinct UMLS CUIs: 604

• PubMed alignment to REACH– Number of UMLS CUIs generated: 192,450– Number of distinct UMLS CUIs: 11,039– Articles with no MeSH is 1,293 out of 15,975

24

Links

• SPARQL endpoint:– http://link.informatics.stonybrook.edu/sparql/

• CUI alignment tool:– http://link.informatics.stonybrook.edu/MeaningLookup/

• Points to start browsing linked data:– http://link.informatics.stonybrook.edu/umls/– http://link.informatics.stonybrook.edu/umls/SAB

• Open source code developed at SBU:– http://code.google.com/p/py-triple-simple/

• Native Python RDF utility

– http://code.google.com/p/sbu-mi-vivo-tools/• Automated dumping of VIVO sites RDF and alignment to UMLS and PubMed

– http://code.google.com/p/spyder-web/• Faceted browser and lightweight web service for parameterized SPARQL queries

25

Acknowledgements

• Supported through:

VIVO: Enabling National Networking of ScientistsNIH U24 RR029822

• Original interactive CUI alignment tool created by Jakub Pezacki (SBU Class of 2010)

26

top related