barbara bushman & nancy fallgren technical services division national library of medicine...

28
Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services ALCTS Metadata Interest Group ALA Midwinter February 1, 2015 Linked Data Initiatives at NLM

Upload: bruce-mccoy

Post on 21-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

Barbara Bushman & Nancy FallgrenTechnical Services Division

Nati onal Library of MedicineNati onal Insti tutes of Health

U.S. Department of Health and Human Services

ALCTS Metadata Interest GroupALA Midwinter

February 1, 2015

Linked Data Initiatives at NLM

Page 2: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

2

Agenda

BackgroundNLM Linked Data Infrastructure Working

Group MeSH (Medical Subject Headings) RDF

PilotNext StepsLessons Learned

Page 3: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

3

Existing NLM Linked Data InitiativesPubChem RDFBIBFRAMEMESH RDF Prototype

Existing 3rd party RDF versions of NLM datasets

MeSH (6 different versions)LinkedCT (clinical trials data)

Background

Page 4: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

4

NLM Linked Data Infrastructure Working GroupBroad collaboration across NLM divisionsDevelop and build infrastructure for transforming, storing

and publishing NLM linked dataResearch best practices in publishing linked dataRecommend NLM-wide policies and guidelines for linked

data publishingDocument guidance for maintaining the established

linked data infrastructure Recommend processes for future data linking projectsPrioritize NLM datasets for publication as linked data

Page 5: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

5

NLM Linked Data WG Process

Shared working environmentSharePoint for administrative

documentationGitHub private site for development

Develop a common level of understandingReview existing linked data initiatives

PubChem RDFMeSH RDF prototype

Page 6: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

6

Pilot Project: MeSH RDF Community impact

Widely used in the health and medical communityAbility to relate many disparate health and medical

resources

Community interest evidenced by Multiple 3rd party versions publishedRequests stemming from BIBFRAME

experimentation

Existing MeSH RDF prototype

Page 7: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

7

Decisions

URI (id.nlm.nih.gov) RDF vocabulary/Predicates

(create our own vs. use existing) License Consultants

Page 8: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

8

How to Provide the Linked Data

FTPXML, XSLT, RDF

SPARQL endpoint MeSH RDF files loaded into a graphStored in Virtuoso triple store Accessible via Lodestar interface

Page 9: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

Creating MeSH RDF

Page 10: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

10

Creating MeSH RDF

Transformation of MeSH XML to MeSH RDF

NLM INTERNAL NLM PUBLIC

USERS

Page 11: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

11

Page 12: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

12

MeSH in RDF

mesh:D015242

mesh:D015242

mesh:D015242

mesh:D000900

Ofloxacin

mesh:Q000009

mesh:D000900

Anti-Bacterial Agents

rdfs:label

meshv:allowableQualifier

meshv:pharmacologicalAction

rdfs:label

Subject Predicate ObjectD015242 MeSH Heading Ofloxacin

D015242 Allowable QualifiersAA AD AE AG AI AN BL CF CH CL CS CT DU EC HI IM IP ME PD PK PO RE SD ST TO TU UR

D015242 Pharm. Action Anti-Bacterial Agents

Page 13: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

13

XML2RDF Modeling Issues

Descriptor/Qualifier pairs Not in MeSH XML ‘Illegal’ descriptor/qualifier combinations

Hierarchical relationships are not identified in MeSH XML

Transitive relationships are not always true between descriptors in multiple tree nodes

Page 14: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

14

MeSH Trees for Eye

Page 15: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

15

RDF Statements Must Always Be True

<Face> <has narrower term> <Eye>

<Eye> <has narrower term> <Eyebrows>

<Sense Organs> <has narrower term> <Eye>

<A01.456.505> <has narrower term> <A01.456.505.420>

<A01.456.505.420> <has narrower term> <A01.456.505.420.338>

<Eye> <has narrower term> <Eyebrows>

<A09> <has narrower term> <A09.371>

<A09.371> <has narrower term> <A01.456.505.420.338>

Page 16: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

16

Using Only Transitive RelationshipsAre Sense Organs really a broader term for Eyebrows?

meshv:broader

meshv:broader

meshv:broader

meshv:broader

meshv:broader

meshv:broader

Page 17: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

17

Using Transitive and Non-Transitive Relationships

A01.456.505

Face D005145

A01.456.505.420

A01.456.505.420.

338

A09

A09.371

A09.371.613

EyeD005123

Sense OrgansD012679

EyebrowsD005138

Oculomotor MusclesD009801

meshv:treeNumber

meshv:treeNumber

meshv:treeNumber

meshv:treeNumber

meshv:treeNumber

meshv:treeNumber

meshv:broaderTransitive

meshv:broaderTransitive

meshv:broaderTransitive

meshv:broaderTransitive

meshv:broadermeshv:broader

meshv:broader meshv:broader

Page 18: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

18

(Soft) Beta Launch

http://id.nlm.nih.gov Launched Nov. 17, 2014 at the American

Medical Informatics Association conferenceWork in progress

Still tweaking model and documentationNo public news announcementsNo press releaseNo direct link on NLM home page

Page 19: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

19

Beta EvaluationFeedback from partners and others

Public GitHub site (https://github.com/HHS/meshrdf ) Customer serviceSocial media

AnalyticsLog filesWebTrends

Page 20: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

20

MeSH RDF Next Steps

Next release of MeSH RDF ca. May 2015Update to 2015 MeSH Resolve outstanding issues raised during

betaUpdating/versioningReview MeSH RDF elementsContribute to revising MeSH XML

Page 21: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

21

Lessons LearnedHave a flexible timeframeCollaborate broadlyDocument everythingAsk for helpUnderstand expectations and anticipated

outcomesCreate an evaluation planValue community collaboration

Page 22: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

22

MeSH RDF Beta

Demo Landing pageTechnical documentationGitHubSample SPARQL query

Page 23: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

23

Page 24: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

24

Page 25: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

25

Page 26: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

26

Page 27: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

27

Page 28: Barbara Bushman & Nancy Fallgren Technical Services Division National Library of Medicine National Institutes of Health U.S. Department of Health and Human

28

Questions/CommentsBarbara Bushman

[email protected]

Nancy Fallgren

[email protected]

Beta MeSH RDF

http://id.nlm.nih.gov/mesh/