daniel schober on behalf of debugit community
DESCRIPTION
Daniel Schober on behalf of DebugIT Community. Semantic integration of antibiotics resistance patterns. Healthcare Context. A need for ‚IT-biotics‘. DebugIT D etecting and E liminating B acteria U sin G I nformation T echnology - PowerPoint PPT PresentationTRANSCRIPT
Slide 1 IDO-WS 2010 Daniel Schober
Daniel Schober on behalf of DebugIT Community
Semantic integration of antibiotics resistance patterns
Slide 2 IDO-WS 2010 Daniel Schober
Healthcare Context
Slide 3 IDO-WS 2010 Daniel Schober
A need for ‚IT-biotics‘
• DebugIT– Detecting and Eliminating Bacteria UsinG Information
Technology– Using ‘semantic linked data’ to exploit distributed clinical
data • Acquire new knowledge
– Through advanced data mining
• Apply knowledge in decision support– E.g. prescription choice
• Apply knowledge in monitoring– Analyze current & predict future trends
– Discover patient safety patterns
Slide 4 IDO-WS 2010 Daniel Schober
The DebugIT project: general architecture
Activities
Repositories
ClinicalData
Repository(aggregated,distributed)
Analysis
KnowledgeRepositories
(aggregated,distributed)
Reasoningengine
Knowledge
ClinicalInformation
System
clinical dataclinical data
Knowledge
1-CollectData
2-Learn
3-StoreKnowledge
4-Apply
Activities
Repositories
ClinicalData
Repository(aggregated,distributed)
Analysis
KnowledgeRepositories
(aggregated,distributed)
Reasoningengine
Knowledge
ClinicalInformation
System
clinical dataclinical data
Knowledge
1-CollectData
2-Learn
3-StoreKnowledge
4-Apply
Slide 5 IDO-WS 2010 Daniel Schober
Using Ontologies in DebugIT
• Provide common semantic identifiers– Allow crosstalk within Interoperability Platform– SPARQL query to express research question– Provide formal meaning exploitable by logical & rule-based
reasoners
• Integrate access to heterogeneous CIS– Normalization via terminologies and textmining
Slide 6 IDO-WS 2010 Daniel Schober
Data normalisation & ontology mapping (annotation)
Refined clinical data– uniform format & semantics– anonymized
Ontologies Text Mining
De-identification
Raw clinical data– different encodings– different languages
Slide 7 IDO-WS 2010 Daniel Schober
Data integration architecture
• ETL (2) populates local RDTBs in DMZ layer• D2R conversion (3) allows SPARQL integration (4) via Ontologies (DCO, OO)
Slide 8 IDO-WS 2010 Daniel Schober
Linking data values to ontologies (via CVs)
1. Textmining links CIS data values to CVs
2. Create SKOS mappings
from CV to Ontology (DCO)
SNOMED CT findings Diseases
Uniprot NEWT taxonomy Bacteria
WHO ATC codes Drugs, antibiotics
Foundational Model of Anatomy Human anatomy
… …
Slide 9 IDO-WS 2010 Daniel Schober
Ontology Layers within DebugIT
1 DebugIT Core Ontology (DCO)- Clinical domain of infectious diseases- OWL-DL
30 Operational ontologies (OO)- Implementation, module crosstalk, data mining
- query building, statistics, analysis, evidences, maths, units, …
- OWL-Full
7 Data Definition Ontologies (DDO)- Describing hospital specific CIS Data model
Slide 10 IDO-WS 2010 Daniel Schober
Describing data
Describing real world (independent of data)
‚female patient‘ in different ontology layers
Slide 11 IDO-WS 2010 Daniel Schober
Steps for solving a clinical analysis question
Clinician states clinical analysis question in natural language
1. Clinical Researcher– clinical analysis query via QueryBuilder & SPARQL OOs & DCO
2. Data Miners– data set queries for each targeted CDR via SPARQL DDOs
3. Data Manager– maintains N3 rule set to convert instances from the endpoint specific
DDO to OO & DCO
4. Data Miners– aggregate data set SPARQL result graphs in DCO using the needed
conversion rule sets– performs clinical analysis, e.g. using/creating N3 rules using OOs, DCO– formalizes the clinical analysis result, using OOs & DCO
5. Clinical Researcher– validates result & presents it to Clinician who validates result.
Slide 12 IDO-WS 2010 Daniel Schober
Clinical Analysis SPARQL Query (construct)
“What percentage of Escherichia coli cases, cultured from urine samples, is resistant to the combination of trimethoprim/sulfametoxazol (TMP/SMX) or trimethoprim in the period 2006-2010?”
CONSTRUCT {?percentage
quex:percentageOf ?total;quex:percentageThat ?part;quex:hasValue ?percentageValue; quex:hasUnit units:percent.
?total rdfs:subClassOf cao:EColi, [a owl:Restriction; owl:onProperty cao:culturedFrom; owl:someValuesFrom [
rdfs:subClassOf dco:UrineSample;a owl:Restriction; owl:onProperty biotop:outcomeOf;
owl:someValuesFrom [ rdfs:subClassOf dco:UrineSampleCollection; a owl:Restriction; owl:onProperty event:during;
owl:hasValue [ dco:hasStartDateTime "2006-01-
01T00:00:00"^^xsd:dataTime;dco:hasEndDateTime "2010-12-
31T23:59:59"^^xsd:dataTime]]]].?part rdfs:subClassOf ?total, [
a owl:Restriction; owl:onProperty cao:resistantTo; owl:someValuesFrom [owl:unionOf (dco:Trimethoprim
dco:SulfamethoxazoleAndTrimethoprim)]]}
Slide 13 IDO-WS 2010 Daniel Schober
Clinical Analysis SPARQL Query (where)
WHERE {?percentage
quex:percentageOf ?total;quex:percentageThat ?part;quex:hasValue ?percentageValue; quex:hasUnit units:percent.
?total rdfs:subClassOf cao:EColi, [a owl:Restriction; owl:onProperty cao:culturedFrom;
owl:someValuesFrom [rdfs:subClassOf dco:UrineSample;a owl:Restriction; owl:onProperty biotop:outcomeOf;
owl:someValuesFrom [ rdfs:subClassOf dco:UrineSampleCollection; a owl:Restriction; owl:onProperty
event:during; owl:hasValue [ dco:hasStartDateTime "2006-01-
01T00:00:00"^^xsd:dataTime;dco:hasEndDateTime "2010-12-
31T23:59:59"^^xsd:dataTime]]]].?part rdfs:subClassOf ?total, [
a owl:Restriction; owl:onProperty cao:resistantTo; owl:someValuesFrom [
owl:unionOf (dco:Trimethoprim dco:SulfamethoxazoleAndTrimethoprim)]]}
Slide 14 IDO-WS 2010 Daniel Schober
Data set SPARQL query (for HUG-DDO)
CONSTRUCT {?antibiogram a ddo:Antibiogram;
ddo:hasCulture ?culturing; ddo:hasIdentifiedBacterium [ddo:hasBacteriumCode
"562"^^biosko:uniProtTaxonomyDT];ddo:hasTestedDrug [ddo:hasDrugCode ?atc];ddo:hasOutcome ?antibiogramResult.
?culturing ddo:hasSampleType ?sampleType;ddo:hasResultDate ?resultDate}
WHERE {?antibiogram a ddo:Antibiogram;
ddo:hasCulture ?culturing; ddo:hasIdentifiedBacterium [ddo:hasBacteriumCode
"562"^^biosko:uniProtTaxonomyDT];ddo:hasTestedDrug [ddo:hasDrugCode ?atc];ddo:hasOutcome ?antibiogramResult.
?culturing ddo:hasSampleType ?sampleType;ddo:hasResultDate ?resultDate.
FILTER (?atc = "J01EA01"^^clisko:atc20090101DT || ?atc = "J01EE01"^^clisko:atc20090101DT) FILTER ("2006-01-01T00:00:00"^^xsd:dateTime < ?resultDate && ?resultDate < "2010-12-31T23:59:59"^^xsd:dateTime)
FILTER (?sampleType = "102866000"^^clisko:sct20080731DT)} # to be changed to 122575003 for "Urine specimen"
Slide 15 IDO-WS 2010 Daniel Schober
DDO to DCO mapping via N3 rules
MAPPING FROM HUG-ddo:Culture
TO dco:BacterialCultureProcedure
{ ?culturing ddo:hasSampleType ?sample.
?Sample skos:exactMatch [skos:notation ?sample]}
=>
{ ?culturing biotop:precededBy [a dco:SampleCollection; biotop:hasOutcome [a ?Sample]]}.
Slide 16 IDO-WS 2010 Daniel Schober
Cross-site integrated SPARQL result
2 instances of total result set of 1764
<https://babar.unige.ch:8443/cdr/resource/Culture/100320> a dco:AntimicrobialSusceptibilityTest,dco:BacterialAntibiogramAnalysis, dco:BacterialCultureProcedure;:hasOutcome [:encodes [:qualityLocated [a :SpeciesEscherichiaColiValueRegion]]], [
:encodes [:qualityLocated [a dco:Sensitive]]];:hasParticipant [a dco:SulfamethoxazoleAndTrimethoprim];dco:hasResultDateTime "2006-11-03T09:57:00"^^xsd:dateTime.
<https://lincoln.imt.liu.se:8443/d2r-server/resource/culture/7219> a dco:AntimicrobialSusceptibilityTest,dco:BacterialAntibiogramAnalysis, dco:BacterialCultureProcedure;:hasOutcome [:encodes [:qualityLocated [a :SpeciesEscherichiaColiValueRegion]]], [
:encodes [:qualityLocated [a dco:Sensitive]]];:hasParticipant [a dco:Trimethoprim ];:precededBy [a dco:SampleCollection; :hasOutcome "abnormal urine" ];dco:hasResultDateTime "2008-10-16T00:00:00"^^xsd:dateTime .
…
Slide 17 IDO-WS 2010 Daniel Schober
DCO design principles
• OWL-DL– Reasoner for autoclassification & consistency checks during OE– Reasoner infers multiple parenthood
• Reusing BioTop– Ensure a rigid modeling view – Provides reuseable constraints (bridges to all TLO)
• Concepts harvested from– Hospital CDR schemata– Competency questions from clinical use case
• Datadriven bottom up
– Domain terminologies in use• Via UMLS or OLS• Ontology modularisation tools (A.Rector)• HL7 v3 based
Slide 18 IDO-WS 2010 Daniel Schober
DCO content (statistics)
•Ontology elements & axioms •Overall •DCO •BioTop
•Classes •1311 •1014 •375
•Object Properties (relations) •78 •3 •74
•Datatype Properties •11 •10 •0
•Subclass Axioms •1494 •1050 •444
•Equivalent Class Axioms •197 •98 •99
•Disjoint Axioms •76 •1 •75
Slide 19 IDO-WS 2010 Daniel Schober
A tripartite granular disease model (SDP pattern)
Slide 20 IDO-WS 2010 Daniel Schober
Inference of new facts(BloodSample is a BodyLiquidSample)
Stated Facts
•Inferred Hierarchy (more structure)
Logics Reasoner
•BodyLiquidSample =
•BloodSample =
•Asserted Hierarchy (flat list)
•BodyLiquid =
Slide 21 IDO-WS 2010 Daniel Schober
•Use CNL for Ontology Evaluation
Slide 22 IDO-WS 2010 Daniel Schober
Next steps
• Enhance coverage • Refinement of DCO structure
– Addressing drugs dosages & disease therapies– Use rectors Snomed CT modularisation algorithm to extract
relevant SNOMED CT IDs form DCO-provided seed list
• Publish and distribute– E.g. on Bioportal
Slide 23 IDO-WS 2010 Daniel Schober
DCO evaluation
• Ultimate overall evaluation• Can clinicians
– run the overall system ?– build queries and understand results ?
• Can data miner – create data set results ?– do data mining and formalize quality criteria for results ?
• DCO internal evaluation• Fitness for use tested by ability to answer CQ
• Evaluate validity of assertions by– Reasoners– Graphical and textual representations to domain experts– Serialization of modules into Constrained Natural Languages (CNL)
Slide 24 IDO-WS 2010 Daniel Schober
(Preliminary) Conclusion
• Semantically rich application ontologies• Successive Query formalisations are complex
… but approach scales over space & time
• Used in practice– Practical SPARQL query building– Data integration across 7 EU Hospitals
• DL-reasoning helps ontology engineering– DL limitation justified for smaller ontologies – For larger models use rule-based reasoning
• As data is dirty we need we need to cope with errors arising
Slide 25 IDO-WS 2010 Daniel Schober
Resources & Acknowledgements
Resources• DebugIT project
– http://www.DebugIT.eu
• Ontology sources– http://purl.org/imbi/dco/dco
• TermBrowser – http://www.imbi.uni-freiburg.de/~schober/dco_owlDoc/
Acknowledgements• Hans Cools, Martin Boeker, Kristof Depraetere, Douglas Teodoro, Remy
Choquet, Stefan Schulz, Ilinca Tudose, Maren Kechel, Giovanni Mels, Dirk Coalert, Dimitris Iakovidis, the DebugIT team
• Funded by grant agreement ICT-2007.5.2-217139
Slide 26 IDO-WS 2010 Daniel Schober
In the Hospital kitchen I was approached by a member of the feared ‘Antibiotics Resistance’ …