ontology innovations project-work-in-progress presentation 070214

Upload: titus-schleyer

Post on 12-Oct-2015

60 views

Category:

Documents


0 download

DESCRIPTION

In this project, we began to compare the process of retrieving and analyzing data for clinical research using a relational database with an ontology-based method. In relational databases, data are represented in tables that are linked through fields called "keys." Ontologies, on the other hand, represent domains through classes and their relationships. We used data extracted from the Indiana Network for Patient Care to identify patients with a diagnosis of breast cancer who were treated with one or more medications. In this presentation, we discuss early lessons learned from this process.

TRANSCRIPT

Slide 1

Pilot study for ontology-based analysis of INPC data: Final reportT. Schleyer, A. Ruttenberg, B. Duncan, F. Smith, A. Roberts

1Original project goalsselect several representative research questions that use INPC datamodel the data needed for these questions in an ontologyreplicate data retrieval/analysis using SPARQL and Rcompare understandability, documentation, query complexity, workflow and extensibilityINPC data analysis current workflowReceive request for dataSearch for dataReturn results to requestor- Often a list of terms/criteria or a brief written description- Sometimes a spreadsheet of codes (ICD9, CPT, etc) to search for- Find any codes needed (e.g. look up medications by name or class)- Map between coding systems (e.g. ICD9 to Regenstrief dictionary) - Have requestor review codes- Perform search across numerous tables, some of which duplicate information- Iterative process refine and re-run queryChallenges with current processdata managers = Greek Oraclerelational database a technical/idiosyncratic construct (e.g. naming constraints, normalization, performance)meaningful, real-time interaction about data difficultlittle to no opportunities to leverage external data representation resourceshard to detect problems

Relational databases and hidden meaning

sys_id is coding system, such as ICD9, local codes, SNOMED, LOINC.code is the actual code, like ICD 920.1.service_code is the question that the record is the answer to.top_parent_service_code is the code of the parent question.value_type indicates what the type is for the data (coded, numeric, etc.). Sample query: Vending Machine:10 Breast cancer WHERE SERVICE_SYS_ID=1 and ((SERVICE_CODE in ('189' /*DX and COMPLAINTS*/, '4569'/*E.R. DIAGNOSIS*/, '4966'/*HOSP DX*/, '7076'/*DX LISTS*/, '7686'/*HOSP HX*/, '7909'/*DISCH DX*/, '9950'/*REHAB DX*/, '9951'/*ORTHO DX*/, '9952'/*SURG DX*/, '9953'/*ENT DX*/, '9954'/*EYE DX*/,'9955'/*DERM DX*/, '9956'/*NEURO DX*/, '7909'/*Disch Dx*/, '14360'/*OB Discharge Diagnosis*/, '36129'/*Axis IV Discharge Dx*/, '3871'/*Initial Dx*/, '16501'/*Discharge Dx/Prob*/, '19825'/*Ekg.Cart.Dx*/, '21827'/*DIDS DX*/, '21669'/*ANDROLOGY DIAGNOSIS */, '22813'/*VISIT DIAGNOSIS*/, '21237'/*Primary Care Dx*/,'19788'/*Preoperative Diagnosis*/, '37081'/*OB Triage Admission Diagnoses */,'37086'/*OB Triage Discharge Diagnoses*/) and (upper(VALUE_TEXT_FOR_DISPLAY)='BREAST CA') ) What are ontologies?represent domains through classes and their relationshipsEach class in an ontology has a defined and unique meaning.Properties are semantic relationships among classes, e.g.:simple: "Patient has: gender, agecomplex: is_a, is_treated_by, etc.

Example: Oral Health and Disease Ontology

(http://code.google.com/p/ohd-ontology/, http://www.ontobee.org/browser/index.php?o=OHD)OHD - Caries finding

OHD tooth restoration procedure

OHD tooth

Reusing other ontologies

Finding breast cancer drugsFirst we find cancer patients by querying for patients that:have a cancer diagnosis ICD9 codehave a concept code in clinical variable that identifies a cancer diagnosisFound a total of ~1500 patients for the 1 year of records we have.We search the pharmacy_order table for prescriptions to cancer patients:About 39,000 total 26,000 have NDC codes, 13,000 dont!The 13,000 prescriptions comprise ~400 prescription typesExamples include: MORPHINE SUL TAB 30MG ER, NAMENDA TAB 5MG, NITROFURANTN CAP 100MGNote that queries done at Regenstrief typically will miss 1/3 of the prescriptions.

ComponentsCancer patientsPrescriptions for themDiagnoses of themICD9 HierarchyNDF-RT OWL translationMapping of NDC to RxNormMapping of RxNORM to NDF-RTRepresentation choicesCodes are information artifacts, about whom or what they are coded.Patients are actual patients.NDF-RT are actual drugs.Prescriptions are directive information entities.OBO Ontologies: OBI, IAO, OGMS, OMRSEOther ontologies/documents: NDF-RT, ICD9Web services: RxNorm APIStore: OWLIM SE, HoerstKey LeverageUse of NDF-RT hierarchies and relationsIngredientsPhysiological effectsTherapeutic classesCause May treat, Mechanism of ActionUse of ICD9, limited as it isLeverage classification to be able to compute malignant neoplasm = neoplasms benign neoplasmsTransparency of data artifactsData team has learned about structure in process.

RxNorm to NDF-RTRestricted to cancer patients in 1 yearFind all prescription NDC codesUse internal concept mapping to get 1037 RxNorm codesUse NDF-RT to get 47488 NDF-RTRxNorm mappings using SPARQL against OWL NDF-RT

prefix rxcui: SELECT ?class ?rxnormWHERE { ?class rcxcui: ?rxnorm . }

RxNorm to NDF-RT 328 RxNorms not in NDF-RT derived mapUse the RxNorm WEB API to find:more general termor, remapped term more general term of remapped termremapped, remapped termmore general term of remapped, remapped termand add mapping if foundLeaving: 21 unmapped terms

RxNorm to NDF-RT mapping1037 tried1016 successful9 have RxNorm codes that cant be resolved207982,309937,311945,314058,314265,404282,562715,845521,96653312 were not mapped 0.5 ML Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.12 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011 (H3N2)-like virus 0.12 MG/ML / Influenza B virus vaccine, B-Wisconsin-1-2010-like virus 0.12 MG/ML Prefilled Syringe [Fluzone High-Dose 2012-2013 Formula]Coal Tar 200 MG/ML Topical SolutionInfluenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.03 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011 (H3N2)-like virus 0.03 MG/ML / Influenza B virus vaccine, B-Wisconsin-1-2010-like virus 0.03 MG/ML Injectable Suspension [Fluzone 2012-2013 Formula]Isopropyl Alcohol 0.7 ML/ML Medicated Pad [BD Alcohol]Isopropyl Alcohol 0.7 ML/ML Medicated PadPOLYETHYLENE GLYCOL 3350 105 MG/ML / Potassium Chloride 0.00497 MEQ/ML / Sodium Bicarbonate 0.017 MEQ/ML / Sodium Chloride 0.0479 MEQ/ML Oral Solution [NuLytely]POLYETHYLENE GLYCOL 3350 105 MG/ML / Potassium Chloride 0.00497 MEQ/ML / Sodium Bicarbonate 0.017 MEQ/ML / Sodium Chloride 0.0479 MEQ/ML Oral Solution [TriLyte]POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride 0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution [Gaviltye-G]POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride 0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution [Golytely]POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride 0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral SolutionPrednisone 10 MG Oral Tablethydrocortisone acetate 10 MG/ML / Pramoxine hydrochloride 10 MG/ML Topical Foam [Epifoam]Lessons learneddiscovery of data quality issues, such as missing results and data irregularitiesmaintaining classes easier than maintaining queries and setsleveraging other people's work reduces your owntransparency of data discovery/query refinement processinherent documentation in ontologies (as opposed to information in Faye's head)Thank you for your attention.

Questions?