an ontology driven module for accessing chronic pathology literature- chronious-swws2011
TRANSCRIPT
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
An ontology driven module for accessing
chronic pathology literature
Joint work with Stephan Kiefer, Jochen Rauch, Marco Attene, Franca Giannini, Simone Marini, Luc
Schneider, Carlos Mesquita, Xin Xing
Riccardo Albertoni,Institute for Applied Mathematics and Information Technology
C.N.R., Genova, Italy
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Motivation, problem area Chronic diseases are the leading causes of death and
disability for a large amount of people in most industrialized nations.
Chronic diseases have a deep impact on today’s society costs:1. Costs of medical care in relation to diagnosis and treatment of
disease2. Loss of human resources caused by morbidity or premature death3. Intangible costs capture the psychological dimensions of illness
including pain and anxiety
New technologies for acquiring and analyzing vital signals are arising
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Motivation, problem area
New monitoring and treatment of chronic patients are becoming possible
No already existent guidelines are available
Knowledge in the domain is rapidly evolving
Need of tools for indexing and retrieving well focused documentation accordingly to continuously evolving knowledge
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Context: the ProjectAn Open, Ubiquitous and Adaptive Chronic Disease Management Platform for Chronic Obstructive Pulmonary Disease (COPD) and Chronic Kidney disease (CKD),
FP7-ICT-2007–1– 216461 CHRONIOUS, February 2008 – January 2012 (48 months) http://www.chronious.eu/
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Context: The CHRONIOUS project
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Literature search module: design requirements
Using explicit medical terminology (e.g., controlled vocabularies) Specific to the considered pathologies, but also terminology allowing
different levels of granularity in search (searching by coarse- and fine- grained concepts)
Terminology as much as possible Modular and Extendable CKD and COPD are a kind of test-bed for Chronious, but other chronic
diseases should be pluggable eventually Knowledge in these domains evolves, so do related terminologies! we should
support in keeping terminologies up-to-date
Offering multilingual capabilities Search must be possible in different languages, at least when well
established translations of terminologies are available
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
The developed system
Document Search Conceptual search
Metadata search
Free Text searchNLP
Document upload
AutomaticUpload
tool
Manual upload
tool
CKD COPD
ontologyOWL/RDF
MeSHthesauru
s SKOS/RDFmapping API Indexer
Concept Associator
Knowledge cache
Document processing
Format Transformati
onNLP
Ontology enrichme
nt tool
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Terminology to index scientific literature
Medical Subject Headings (MeSH) is a well known controlled vocabulary used for indexing articles from MEDLINE/PubMed it isn’t enough specialized to deeply cover COPD and CKD domains
Ontologies have been defined to deepen COPD and CKD diseases (OWL by IFOMIS)
However MeSH is still required in Chronious The search is not always made at the same level of granularity, often
keyword search can be done moving back and forward from coarse to very disease-specialized concepts
Multilingual support, some “certified” translations are available for example in Italian, Portuguese, Spanish
Terminological de facto standard, clinicians expect MeSH is included
How to combine ontologies and MeSH in CHRONIOUS ?
COPD, CKD Ontologies
Middle Layer Ontology for Clinical Care (MLOCC)
Open Biological and Biomedical Ontology (OBO) Foundry:
Basic Formal Ontology (BFO) + Relation Ontology (RO) +
Foundational Model of Anatomy Ontology (FMA)
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
The adopted approachRDF URI as a kind of lingua franca
MeSH (provided by the US medical library) has been encoded in SKOS/RDF (W3C)
Italian, Portuguese and Spanish translations of MeSH (provided by national authorities) have been encoded in SKOS/RDF We kept RDF ID consistent to the original MESH descriptor
identifiers A semi-automatic mapping between MeSH in SKOS and
developed Ontologies A script compares MeSH terms with lexical representation of concepts
form OWL ontologies The suggested mapping are validated in two stages-process by Ontology
Engineers and Clinicians
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Natural Language Processing
Based on General Architecture for Text Engineering (GATE) framework Open Source, JAVA suite, originally developed at the University of Sheffield
beginning in 1995 used worldwide by a wide community of scientists, companies, teachers and
students for all sorts of natural language processing tasks, including information extraction in many languages
Default processes applied to extract headwords of a text: Sentence splitter, Tokeniser, Part-of-speech tagger, Morphological analyzer
Modules included for the Ontology Enrichment Tool and the Indexer OntoRoot Gazetteer: A GATE plug-in that produces ontology aware
annotations for extracted terms; Shallow Parser: it identifies word groups such as “chronic diseases” and
“lung function”; RegEx-Pattern Matcher: it matches a lemma of a token with word patterns
defined as regular expression; Thesaurus Matcher: it matches the lemma of a token to a domain thesaurus,
a JAPE resources has been developed to access MEsH and the mapped ontology concepts through MeSH Mapping API
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Ontology enrichmentSemi-automatic process - candidate concepts are rated according to
Corpus relevance: determined by its average Term Frequency Inverted Document Frequency (TF.IDF) value with respect to the whole document corpus;
Concept co-occurrences average distance in the text between the candidate concept and a concept within the corpus is calculated as a benchmark.
Domain relevance: matching with common dictionary (WordNet), domain thesaurus (MeSH) and with regular expression patterns;
Subclass-of relations: extraction of vertical relations, linguistic patterns or dictionary hypernyms.
Candidate concepts are marked as “new”, “to validate”, “postponed”,“accepted” or “rejected” by ontology engineers and clinicians
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Ontology Enrichment Tool – Relevance details
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Search Interface
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Search results:(black Box Testing) intial comparison with PubMed
“Inhaler Device”
“PostBronchodilator Spirometry”“
“Inhaler device”
Horizontal axis: Number of considered/retrieved documentsVertical axis: F-measure Also glass box testing has been performed to ensure ontologiesrepresent the right concepts
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Conclusions
The CHRONIOUS search system can become a specialized indexing and search system for the hospital:
It can manage internal hospital documents to be indexed into the database
It can cover other medical domains and languages using MeSH
It is already specialized in COPD and CKD by using specific ontologies
It provides the tools for ontology maintenance thus well suited to domains characterized by rapidly evolving knowledge
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Critical and open issues for future work
User notifications • about changes in Enrichment Tool data (e.g. if new documents with extracted candidate concepts are available)• supporting the collaboration among clinicians and Ontology Engineers
Re-indexing of documents• what happens when there is a new ontology version?• some incremental indexing should be provided