pushing back, standards and standard organizations in a semantic web enabled world
DESCRIPTION
Keynote at SWAT4LS (Semantic Web Applications and Tools for Life Science) 2013TRANSCRIPT
“Pushing Back”Standards and Standard Organizations in a Semantic Web Enabled World
Kerstin ForsbergInformatics ScientistAstraZenecaMölndal, Sweden
Image: Flickr bitpuddle (Twitter @eric_d_hancock)
AZIT | R&D Information
Purpose
Encourage standard organisations to“Use Standards for Standards”
Agenda
• Standards for Data and Semantics
• Exemplas of Standard Organizationsnow looking into using Semantic Web
• Provenance/Justification for Mappings
2 Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
Kerstin Forsberg (@kerfors)
• “Volvo Web Wave Project” 1995-1997W3C conferences 1996 & 1999, Dublin Core, RDF
• “Extensible use of RDF in a business context”paper presented at the W3C WWW9 conference, 2000, Amsterdam
• “Advancing translational research with the Semantic Web” joint W3C HCLS paper in BMC Bioinformatics, 2007
• “Linked data, an opportunity to mitigate complexity in pharmaceutical research and development”Summary of experiences from LarKC and W3C HCLS2011 together with my colleague Bosse Andersson
“Information architect, semantic web and linked data enthusiast caring about clinical trial data.”
3 Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
About AstraZeneca
• Alongside our own R&D, we partner with others, combining skills and resources to broaden the potential for successful innovation.
• We believe that only by working together with others who have a part of play in improving healthcare can real progress be made.
• We work closely with others in the healthcare community, including physicians and those who pay for healthcare, to understand their challenges and how we can combine skills and resources to achieve a common goal: improved health.
4 Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AstraZeneca’s view on “Semantics”
Enabling the hyperconnected enterprise
5 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
“We need to build a linked data architecture enabling us to ask questions and solve business problems across a heterogeneous information landscape extending beyond the traditional boundaries of the enterprise.”
semanticsconnectsusall
Standards for Data and Semantics
Different types of standards
6 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
Standards for Data and Semantics
Examples
7 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
“Pushing back” – Use standards for standards
1. NCI (National Cancer Institute)Thesaurus
8 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
“Pushing back” – Use standards for standardsAZ Vocabulary Management team shared this with NCI EVS
9 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• The NCI Thesaurus is an extensive medical vocabulary published by the US National Institutes of Health: http://ncit.nci.nih.gov/
• It is made available in several downloadable formats: http://evs.nci.nih.gov/ftp1/NCI_Thesaurus
• In order for use to use the thesaurus in our system, we need to convert it to RDF, following the SKOS standard: http://www.w3.org/2004/02/skos/
Jim Morris, Informatics ScientistAstraZeneca R&D Wilmington, USA
“Pushing back” – Use standards for standards2. MedDRA (Medical Dictionary for Regulatory Activities)
10 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
“Pushing back” – Use standards for standardsAZ Vocabulary Management team shared this with MedDRA MSSO
11 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D InformationCourtland Yockey, Informatics ScientistAstraZeneca R&D Wilmington, USA
A very simple SKOS-rendering of MedDRA• term skos:Concept• hierarchy level
skos:ConceptScheme• SMQ skos:Collection
Approach should be augmented with VoID representation of MedDRA versions and term properties distinguishing active from inactive terms.
Skos:Collection is likely not sufficient to support SMQ versioning nor context of terms in an SMQ (e.g. weight)
“Pushing back” – Use standards for standards
3. CDISC (Clinical Data Interchange Consortium)
12 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• Entity-based Ontologies
• Concept-based Terminologies / Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
Standards for Data Exchange
Clinical Trial Data standardized “containers”
13 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
Trial Summary level
Patient level
Submission standards SDTM “designed so [FDA] reviewers with no tools other than perhaps the SAS Viewer would be able to open a dataset and browse it easily”.
Standards for Data Exchange
Documentation of standardized “containers”
14 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
Human readable documentation in 200+ pages PDF:s, Excel:s (and some in XML).
Standards for Data Exchange
Data in standardized “containers”
15 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
CDISC SDTMImplementationGuideline (IG)
Humans can connect data to data standards.
Standards for Data Exchange
Documentation of Standard fragments
16 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
CDISC SDTM Model 1
CDISC SDTMImplementationGuideline (IG)
2
CDISC SDTMControlled Terminiolgy
3Humans can connect data to data standards and connect the different standard fragments to each other.
Standards for Data Exchange
Linked Clinical Data Standards
17 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• CDISC2RDF started as a cross-pharma pre-competitive project with AstraZeneca, Roche, W3C et al. to show case Semantic Web standards and Linked Data principles.
• Become part of the Semantic Technology project, a FDA/PhUSE working group for Emerging Technologies, with 30+ repr. from FDA, CDISC, Pharma:s, CRO:s and software vendors.
• First phase: Representing existing“container” standards (SDTM, CDASH,SEND, ADaM) in RDF.
Standards for Data Exchange
Linked Clinical Data Standards
18 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
Human readable documentation in PDF:s, Excel:s (and some in XML)
Machine processable linked data structured as RDF triples(160.000+ )
Serializations of RDF triplesin Turtle and XML …
https://github.com/phuse-org/rdf.cdisc.org
Standards for Data Exchange
Linked Clinical Data Standards
19 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
Human readable documentation in PDF:s, Excel:s (and some in XML)
Import filesAnnotated Excel files from CDISC with
classes and properties from the Schemas ready to transform to RDF triples
using a off-the-shelf tool (TopQuadrant Composer)
Meta Model Schema (mms)Based on the core ISO11179 model
(metadata for data elements and a few CDISC specific classes and properties)
Machine processable linked data structured as RDF triples(160.000+ )
https://github.com/phuse-org/rdf.cdisc.org
Serializations of RDF triplesin Turtle and XML …
Standards for Data Exchange
Annotating existing standards
20 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
Import filesAnnotated Excel files from CDISC with
classes and properties from the Schemas ready to transform to RDF triples
using a off-the-shelf tool (TopQuadrant Composer)
Meta Model Schema (mms)Based on the core ISO11179 model
(metadata for data elements and a few CDISC specific classes and properties)
This turned out to be a good way to communicate with people knowledgeable in CDISC but new to RDF schemas to understand the process of “triplification”.
CDISC and NCIT
Value sets is an issue
21 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
• Concept-based Terminologies / Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
mms:PermissibleValue
mms:ValueDomain
mms:Datasetmms:Data Element
mms:DataCollectionForm
Standards for Data Exchange
Cross standard review and mappings
22 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
Data Elements [SDTM, ADaM, CDASH] ”haveSame” Value Domain (CT)
Provenance/Justification for MappingsExample from EU project SALUS for Post Market Safety Studies
23 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
The example show the hierarchy of cardiac disorders in both the MedDRA andSNOMED-CT concept schemes, expressed using the skos:broader property. Mappings between
similar concepts in both concept schemes are stated using the skos:exactMatch property.From: SALUS Harmonized Ontology for Post Market Safety Studies
Provenance/Justification for MappingsExample from EU project SALUS for Post Market Safety Studies
24 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
The example show the hierarchy of cardiac disorders in both the MedDRA andSNOMED-CT concept schemes, expressed using the skos:broader property. Mappings between
similar concepts in both concept schemes are stated using the skos:exactMatch property.From: SALUS Harmonized Ontology for Post Market Safety Studies
MedDRA:10028596 skos:exactMatch SNOMEDCT:22298006
Provenance/Justification for Mappings
Alternative: Mappings as LinkSets
25 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
The Dataset Descriptions for the Open Pharmacological Space is a specification for the metadata to described datasets, and the LinkSets that relate them.
Provenance/Justification for Mappings
Alternative: Mappings as Nanopublications
26 Kerstin Forsberg | SWAT4LS, Dec 10th 2013 AZIT | R&D Information
MedDRA:10028596 skos:exactMatch SNOMEDCT:22298006
AZIT | R&D Information
Summary
Encourage standard organisations to“Use Standards for Standards”
for sustainability and trustability.
Think if …
27 Kerstin Forsberg | SWAT4LS, Dec 10th 2013
semanticsconnectsusall
AZIT | R&D Information
Acknowledgements
AZ’s Semantic Web Community of Practice members:Tom Plasterer (lead), Jim Morris, Courtland Yockey, Sorana Popa, Rob Hernandez, Mike Westaway, Rajan Desai, Simon Rakov, Dana Crowley, Ian Dix, Johan Törnqvist
Collaborators and Advisors:• Charlie Mead – IO Informatics• Dean Allemang – Working Ontologist• Frederik Malfait – IMOS consulting / Roche• Phil Ashworth – TopQuadrant
28 Kerstin Forsberg | SWAT4LS, Dec 10th 2013
Thank you! [email protected]