enabling faster analysis of vaccine adverse event reports with ontology support

Post on 18-Nov-2014

589 Views

Category:

Health & Medicine

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

A description of my PhD project, aiming at using ontologies to support automated analysis of adverse events reports in the domain of immunization

TRANSCRIPT

ENABLING FASTER ANALYSIS OF VACCINE ADVERSE EVENT REPORTS WITH ONTOLOGY SUPPORT

Mélanie Courtot, Ph.D. candidate, Brinkman lab

Knowledge Translation Seminar, March 21st 2013

Outline• Problem statement and significance• The Adverse Event Reporting Ontology (AERO) for

adverse event reports analysis• Clinical standard• Logical encoding• Classified dataset• Classification using MedDRA annotations and text mining

• AERO for data integration• The Semantic Web• VAERS as linked data

Problem statement

• Importance of monitoring adverse event• Long term effects, various demographics• Detection of abnormal events in population leads to withdrawals

etc

• Current adverse events following immunization (AEFIs) reporting systems use different standards (if any) to encode reports

• The resultant lack of consistency limits the ability to query and assess potential safety issues• Reports are manually assessed: time and money consuming• Inability to assess all reports carefully

Goal and significance of my work• Goal: Improve safety signal detection in vaccine

AEFIs reports• Step 1: Augment existing standards with logically formalized

elements• Step 2: Perform automatic case classification • Step 3: Test classification utility to detect safety signals

• Significance: Increase the timeliness and cost effectiveness of reliable adverse event signal detection

4 steps to automated classification

1. We agree on a standard to describe adverse events

2. We encode that standard in a computer amenable format

3. We map the clinical standard to current adverse events annotations

4. We classify reports of adverse events according to established guidelines

Existing standard: the Brighton collaboration

• https://brightoncollaboration.org• Provides case definitions and guidelines to standardize

reporting• Well established network (adopted as standard in Canada

2009)• Benefits of working with Brighton:

• Existing software tool• Extensive network of collaborators, shared vision

What is missing?

Strategy for encoding adverse event reports

• Model the domain using an ontology • Ontologies typically have two distinct components:

• Names for important concepts in the domain• Prokaryotic cells• Eukaryotic cells

• Background knowledge/constraints on the domain• Nothing can be a prokaryotic and an eukaryotic

cell

Strategy for encoding adverse event reports

• Ontology encoded using the Web Ontology Language (OWL 2)

• Open Biological and Biomedical Ontology Foundry helps with quality, interoperability and avoiding redundant work• More than >100 biomedical ontologies in the suite, e.g., Gene Ontology (GO)

• Reuse of resources (ontologies and tools)

Reasoning is critical

• Prokaryotic and Eukaryotic cell are declared disjoints

• Fungal cell is a Eukaryotic cell

• Spore is a Fungal cell and a Prokaryotic cell

=> inconsistency

doi:10.1371/journal.pone.0022006.g003

Clinical guideline in AERO• Goal: provide a pattern to encode adverse event following

immunization guidelines• This pattern should be applicable to any type of clinical

guideline• Enable the reports to be annotated with diagnosis

according to a specific guideline (and keep track of what it is)

• We want to:• Encode the guideline in OWL• Be able to infer correct classification (i.e., perform accurate

diagnosis)

Current status

• Pattern implemented in the OWL file for anaphylaxis

• Has been successfully used to

model the WHO malaria clinical guidelines• Paper submitted (yay )• Need to add other guidelines

Jie ZhengUpenn

VAERS dataset• VAERS = Vaccine Adverse Event Reporting System• Depends on the Centers for Disease Control and

Prevention (CDC) and the Food and Drug Administration (FDA) in the United States

• Spontaneous reporting system• Issues with underreporting, quality of reporting

• Uses MedDRA annotations (Medical Dictionary of regulatory Activities)

Example VAERS report

Classified VAERS data• Unclassified files available publicly• Classified dataset available upon request (in this case

H1N1 dataset)• Cleanup

• No default NULL value: “none”, “null”, “”…• Multiple languages: encoding issue with Spanish• 5 MedDRA terms per report, or duplicates

• Pre-processing required• Load into database• Match to public records

Classification using MedDRA annotations

• Goal is to map the current Brighton terms in AERO to their MedDRA counterpart

• Then try and classify the MedDRA-annotated reports using the Brighton criteria

• Compare that with classification done by medical experts

Mapping to MedDRA

• Translate, as best possible, MedDRA annotations to Brighton symptoms• Import selected MedDRA terms in to OWL, following

general strategy of Minimal Information to Reference an External Ontology Terms (Courtot, et al. 2011)

• Standardized MedDRA Queries provide useful documentation on how to interpret MedDRA

• OWL used to define Brighton symptoms in terms of MedDRA terms (this will be only approximate)

Classification using text

• In collaboration with Seeker Solutions, a Victoria based company

• Goal is to use text part of the reports to classify them• Process:

• Training data: a set of reports that have been manually classified• Machine learning algorithm learns pattern leading to correct

classification• The model is applied to new testing data

• 2 types of classification tested:• Likelihood• Topic modeling

Likelihood ordering

Topic modeling

Current status

• Testing classification with the MedDRA terms

• Need to work on the MedDRA mapping

• Test classification with AERO (and compare with the one with MedDRA)

• Refine text classification• Using the ontology to guide

clustering• Using Canadian dataset

AERO for data integration

The semantic web• From a web of documents to a web of data• HTML pages can’t be understood by machines; humans

have to manually follow hyperlinks• Semantic web uses standard for data representation,

querying, vocabularies to link data behind the scenes• Use of Uniform Resources Identifiers (URIs) and

Resource Description Framework (RDF)

RDF and URIs• RDF: a language used to represent information about

resources on the web• RDF statement: subject, predicate, object

• URI: unique identifiers for things• http://purl.obolibrary.org/obo/AERO_0000244: major

dermatological criterion for anaphylaxis according to Brighton

Linked Open Data cloud

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

VAERS as linked data• Transform the VAERS dataset in RDF to enable better

integration with existing resources• No need to worry about resources’ structure (CSV,

databases, XML)• Each report is an instance of a VAERS report• System will also provide technical infrastructure to test

classification• RDF automatically generated from the database

containing VAERS data

VAERS as linked dataReport 117893

VAERS as linked data

Querying across linked data• URIs(or mappings between URIs) to link different

resources• Querying on the VAERS dataset

• E.g., are there difference in the type of adverse events between a live attenuated flu vaccine and a trivalent inactivated one?

• Querying across multiple datasets• Identify drugs in text (e.g. Benadryl) and infer they are anti-allergic

agents via DrugBank

Example: link state code in VAERS to state info in DBPedia, pass result to Google visualization API

Acknowledgements

• Alan Ruttenberg, Ryan Brinkman• Oliver He, Yu Lin, Lindsay Cowell, Barry Smith, Ryan

Brinkman, Peter d’Eustachio, Albert Goldfain• Julie Lafleche, Lauren McDonald, Robert Pless,

Barbara Law, Jan Bonhoeffer, Jean-Paul Collet• Brinkman lab

top related