proprietary and confidential © astrazeneca 2009 for internal use only 1 oct 19, 2010 zhaohui (john)...

22
Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca PRISM SIG 2010 La Jolla, CA EMR Data Mining for Drug Safety: Challenges and Opportunities

Upload: brianne-mills

Post on 25-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

1

Oct 19, 2010

Zhaohui (John) Cai, MD, PhDDirector, Biomedical Informatics

AstraZeneca

PRISM SIG 2010La Jolla, CA

EMR Data Mining for Drug Safety: Challenges and Opportunities

Page 2: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

2

Oct 19, 2010

Outline

• Introduction of EMR data for drug safety research• Data sources and limitations

• Challenges

• EMR safety data mining methods

• Proposing an interdisciplinary approach

• One AZ example

• EMR data interacting with drug development data– enabling two-way translation between clinical research and practice

Page 3: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

3

Oct 19, 2010

EMR, EHR, PHR

• NAHIT definition of EMR and EHR• EMR: The electronic record of health-related information on an individual that is

created, gathered, managed, and consulted by licensed clinicians and staff from a single organization who are involved in the individual’s health and care.

• EHR: The aggregate electronic record of health-related information on an individual that is created and gathered cumulatively across more than one health care organization and is managed and consulted by licensed clinicians and staff involved in the individual’s health and care.

• By these definitions, an EHR is an EMR with interoperability

• In reality, it’s common to see the 2 terms used interchangeably

• PHR (Personal Health Record): • A personal health record is a digital health record that is owned, updated, and

controlled by the consumer. It contains a summary of health information from throughout an individual's entire lifetime.

• Examples of information contained in a PHR include a record of immunizations, family health history, personal health history (i.e., significant illnesses and surgical procedures), significant diagnostic procedures and dates (such as mammograms), a list of health problems, a current medication list, allergies, contact information for physicians seen on a routine basis, and a physician visit history.

Page 4: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

4

Oct 19, 2010

What are we talking about?

• EHR/EMR can be • Hospital Information System

• Departmental Systems (laboratory, radiology, pharmacy, materials management)

• Computerized Physician Order Entry Systems

• ePrescribing Systems

• Administrative/Financial/Billing Systems

• Ambulatory Systems

• Specialty Systems (Cardiology, OBGYN, Pediatrics, Nephrology, …)

• Data Warehouse

• Research Database

• Ideally, data must be integrated across the continuity of care and all EHRs should be interoperable in the sense they can be overlaid, and all data is sharable when and as needed

Page 5: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

5

Oct 19, 2010

EMR vs. SRS for drug safety research

• SRS: spontaneous reporting systems, which are database resources containing millions of voluntarily submitted reports of suspected ADEs occurring during regular clinical practice • Current mainstay within pharamacovigilance: typically mined for statistical drug-event

associations to screen for unknown potential ADEs that are then clinically validated and flagged for continued monitoring

• Major SRSs: FDA AERS and WHO Programm for International Drug Monitoring

• Well recognized limitations: under reporting, over reporting due to media influences, subjective diagnoses by the reporter of the event, uneven levels of granularity used to describe or encode the drugs and events, duplicity of reporting for the same patient and event, missing data, typographical errors, confounding issue, lack of denominator or information on exposure

• EMR/EHR• Advantages: earlier detection of ADEs, potential for active and real time surveillance, the

absence of most of the reporting biases attributed to SRS, knowledge on #patients exposed (in the dataset) and thus have a denominator to assess co-morbidity, co-medication, etc.

• Challenges:

• Unstructured narratives are unsuitable for a direct application to pharamacovigilance.

• Confounding issue requires the examination of a much wider range of possible drug-event associations, the majority of which are completely unrelated or associated only because of confounding

• Same biases still exist as in SRS to some extent

Page 6: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

6

Oct 19, 2010

EMR vs. Claims data

• Medical and pharmacy claims• Benefits: captures real-world utilization patterns; encompass a wealth of

variables and analyses of these data can be used for benchmarking purposes

• Challenges: lag time in the availability of information about new therapies; does not capture clinical experience; data limited to patients with adjudicated claims; data limited to insured populations

• EHR/EMR data• Benefits: data available to reflect more complete care experience; data can

be analyzed in an ongoing basis for populations under care; may improve depth and breadth of outcomes studies; used with e-prescribing can reduce adverse drug events, medical errors and redundant tests

• Challenges: converting paper-based systems to electronic collecting and storing data in a standardized format; Certification to ensure security and privacy of EMR systems; interoperability; slow adoption; limited populations, e.g. general practitioners or hospital data

Page 7: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

7

Oct 19, 2010

EMR vs. PMS systems

• PMS – Practice Management System – a software program and database system that processes billing and scheduling information for physicians and hospitals • Administrative data sources created primarily to support reimbursement

• There are many well-recognized limitations with the use of administrative data sets compared to comprehensive EMR data

• The widespread availability of administrative data make them the most widely used source of internal and comparative quality indicators

• EHR / EMR systems– a software program and database system that stores medical record information about a patient’s health• Contain detailed clinical data that are not contained in administrative data

sets. The availability of more clinically relevant data in electronic queryable format represents a new source of data that can be leveraged without the expense of manual chart abstraction.

• The American Recovery and Reinvestment Act contains explicit language linking the ‘meaningful EHR user’ to the ability to capture and report clinical quality measures.

Page 8: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

8

Oct 19, 2010

PMS data content

• Demographics – Similar to EHR demographics, gender, year of birth and city (or 3-digit postal code) are common. Race is very rarely captured.

• ICD9 (International Classification of Diseases, 9th Revision) – PMS Systems capture ICD9 codes as a regular method of billing to insurance companies, Medicare and Medicaid.

• Coding could be frequently exaggerated in order to obtain higher billing reimbursement.

• Some diagnoses are omitted deliberately so as to protect the patient from insurance company blacklisting.

• when dealing with ONLY PMS ICD9 codes, the research team should be vigilant about data abnormalities and irregularities and treats the data accordingly

• CPT4 (Current Procedural Terminology, 4th Edition) – All ambulatory PMS systems employ the CPT4 coding system for medical billing.

• Coding what services and levels of service were performed by the provider

• highly accurate as they are a direct reflection of the work done by the provider

Page 9: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

9

Oct 19, 2010

EHR/EMR data content

• Demographics – Critical data includes gender, the year of birth (without month or day for privacy reasons) and the city or first three digits of postal code. Race is very valuable if captured

• ICD9 – Very useful data, when entered as part of the EHR data capture

• CPT4 – Secondary data and not often captured. Valuable to determine what was ordered (labs, pathology, and radiology) and what procedures were administered

• Vitals – Height + Weight (rendering BMI) are highly desirable. Blood pressure readings are also of primary importance

• Problems – “free-form” problems such as “constipation” as written by the nurse or in the patient’s own words. Hard to standardize unless ICD9 code is used

• Lab Results – Extremely important in pharmacovigilance and readily available in huge quantities.

• Lab Orders – Can be derived from HL-7 lab data (the “ORC” segment) and alternatively can be obtained from the EHR order entry system

• Pathology Results – Values that return “positive” and “negative” are most valuable to pharmacovigilance. Values that return verbose narrative are hard to incorporate into data analysis unless NLP is employed

Page 10: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

10

Oct 19, 2010

EMR/EHR data content -- continued

• Radiology Results• Text – Very valuable data only when processed through NLP to render discrete nomenclature

values

• Images – Not useful by themselves. Only useful when drilling down to study significant adverse events detected by signal detection

• Immunizations – the administration of vaccines and immunizations as a single binary event (given/not given) are the key factors of value in this data

• History• Text – most common form and valuable only when processed through NLP to render discrete

nomenclature values

• Structured Data from Templates – tremendously valuable in pharmacovigilance, only if those discrete values can be captured and interpreted correctly.

• Chart Notes• Text – Same rule as “History, Text”

• Structured Data from Templates – Same consideration as “History, Structured Data from Templates.”

• Attachments – Attachments are typically stored as (1) images or (2) documents with searchable text. Document can be processed through NLP and rendered into discrete nomenclature points which would be very valuable for pharmacovigilance

Page 11: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

11

Oct 19, 2010

Challenges for safety data mining

• Data source limitation • Small populations in some cases (depending on vendors or individual systems)

• Legacy data migration is a major bottleneck

• Missing data type (e.g. no lab results in claims databases)

• Text data needs to be converted to discrete data• NLP (Natural Language Processing): converting text from a History and Physical or Discharge

Summary dictation note into discrete data values is of significant interest in signal detection

• Example: MedLEE™ (Medical Language Extraction and Encoding).

• Data itself• Varied quality (e.g. ICD9 codes from a PMS system)

• Lack of data standards

• Different coding systems/nomenclatures (e.g. National Drug Code, SNOMED, UMLS MEDCIN)

• Hypothesis generation vs. testing• What is a signal or how to define signal thresholds?

• What is a drug-related AE?

• Statistical significance vs. clinical significance

• Confounding factors: co-medication, co-morbidities, indication, or any combination of the three

• What analysis methods to use

Page 12: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

12

Oct 19, 2010

Safety data mining methods at individual level

• Identifying individual ADEs• Case report and case review

• Rule-based: ICD-9 classification rules, allergy rules, drug–laboratory rules (Honigman et al., 01)

• NLP/Text mining: Events were identified to include all possible drug names and adverse effect combinations, using a controlled vocabulary of medical concepts and drug terminology that allows multiple relationships between multiple medical terms and events(Honigman et al., 01)

• Deviation detection: looks for outliers or values that deviate from the norm (e.g. FDA guideline on DILI, 2009) and can be seen either graphically or statistically (FDA eDISH tool)

• Individual patient timeline, i.e. temporal relationship, needed to establish drug-event pairs for all the above

Page 13: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

13

Oct 19, 2010

Safety data mining methods at population level

• Measures of disproportionality• Frequentist methods: the disproportional reporting rate signal threshold

against epidemiology- based rates should be considered• Proportional reporting ratio (PRR) and Reporting odds ratio (ROR)

• The most common published threshold is when PRR>2, and the number of reports or cases N>3. Additional criteria can include a statistical strength of Chi square and if used this is often set at Χ2>4 (Deshpande, Gogolak et al.)

• Most useful for initial assessments particularly with newer drugs and to monitor changes in proportional reporting rate over time

Wilson et al., 2004 Br J Clin Pharmacol 57:2

Page 14: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

14

Oct 19, 2010

Safety Data Mining Methods at population level

• Bayesian methods

• Bayesian Confidence Propagation Neural Network (BCPNN) – WHO UMC

• Empirical Bayes Geometric Mean (EBGM), based on Multi-item Gamma-Poisson Shrinker (MGPS) – FDA

• The specificity of these methods is very good and can be configured to evaluate drug-drug interactions and complex induced medical syndromes, but they do have lower sensitivity.

• In retrospective comparisons of different methods, the frequentist methods almost always consistently triggered an alert sooner than the Bayesian methods (Hauben and Reich 2004; Hauben, Reich et al. 2006; Chen, Guo et al. 2008).

• Statistical testing

• Statistical hypothesis tests such as the chi-squared test and Fisher’s exact test,

• Used to test the hypothesis of independence between a pair of drug and event

Page 15: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

15

Oct 19, 2010

Other methods

• Predictive modeling• Classification: develop a model to relate a dependent variable (AE) with a set

of independent variables (drug, dose, age, gender, etc.) and predict group membership (w/ or w/o the AE) of new records based on their characteristics (the independent variables).

• Regression: value prediction for continuous dependent variables based on a set of independent variables

• Clustering:• Reduce a large sample of records to a smaller set of specific homogeneous

subgroups (clusters) without losing much information about the whole sample.

• Hypothesis generation (clustering by symptoms or diagnoses to see if there is a drug association)

• Both designed to deal with very large data sets, that contain many more variables (predictors) than observations, just like safety databases or EMR systems in which the thousands of drugs, conditions, or events that exist and can be analyzed simultaneously for drug-event associations.

Page 16: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

16

Oct 19, 2010

Approaches to address confounding

• Stratification approaches• Stratification and Mantel-Haenszel test statistics

• Effective for addressing confounding in large sample sizes and small number of confounding variables.

• In other cases (e.g. safety databases or EHR systems) they are not as effective

• Regression (McNamee R 2005) or classification models• Allow for the evaluation of several risk factors simultaneously

• Incorporating potential confounding variables into the model: the value of a dependent variable (e.g., the presence of an AE) is explained by a set of predictor variables (e.g., different drugs or conditions), each with its own degree of contribution

• Controlling for possible confounders: the effect or influence of the confounding variables on the predictor variables could be assessed to determine whether or not the relationship between the dependent and predictor variables is influenced by the confounders

Page 17: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

17

Oct 19, 2010

Proposing an inter-disciplinary approach

Define appropriate questions (by PSE)

Identify appropriate data sources (by PSE, Ix)

Hypothesis generation (by PSE)

Hypothesis testing (by PSE, Epi, Stats…):

RCT or observational studies

PSE: patient/drug safety experts (physians or scientists), including pharmacovigilance scientistsStats: StatisticiansIx: Informatics scientistsEpi: Epidemiology scientists

* Need one or more external data partners, and it’s possible to use them in combination

Choose EMR data*

Choose claims databases*, PMS*, or SRS

• Mapping vocabularies• Standardize data format• Integrate databases

Modeling (by Stats, Ix):

regression/classification

NLP/Text mining (by Ix)

Disproportionality measures (by Epi, Stats)

• Frequentist methods• Bayesian methodsStatistical testing (by Epi, Stats) • Chi-square• Fisher’s exact

Page 18: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

18

Oct 19, 2010

AZ-NWeH collaboration

• NWeH • Formed as a collaboration between

• University of Manchester

• Salford Royal Foundation Trust

• Salford Primary Care Trust

• Combines the University's strength in bio-health informatics and technology innovation with Salford NHS's strength in front-line clinical informatics and the integration of primary and secondary care

• DILI study team working with AZ: physicians, informatics scientist, statisticians, etc.

• AZ multi-disciplinary team: • Informatics: Clinical Informatics and Discovery Information

• Hepatotoxicity Knowledge Group (safety physicians and scientists)

• Statistics

• Epidemiology

Page 19: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

19

Oct 19, 2010

AZ-NWeH DILI Study: aim and objectives

Aim

• To explore the feasibility of using naturalistic cohort data from the NHS, through linkage of liver function test (LFT) records, to study drug induced liver injury (DILI) at the population level

Objectives 

• Identification of eligible data-sources

• Linkage of health records

• Identification of case-cohorts

• Detailed analysis of a case-cohort

• Process Improvement for data collection, collation and analysis within a healthcare firewall

Page 20: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

20

Oct 19, 2010

AZ-NWeH DILI Study: challenges & opportunities

Challenges• Infrastructure/systems building: EHRs, research databases

• Data standard and controlled vocabularies: need to review the rubrics from Salford GP data, for diagnosis, lab tests, etc.

• Observation period selection: depending on data availability (#patient years) and lab test frequencies

• Characterising the Salford data in order to understand what questions can be meaningfully addressed

Opportunities • Process Improvement for data collection, collation and analysis within a healthcare firewall

• Building a large longitudinal datasets from primary care for research purpose, combining diagnoses, lab tests, and prescriptions relevant to liver signals

• Developing enhanced metrics and tools to “appraise” a data set.  i.e. an information score to summarize the frequency and regularity of testing as a more general measure of the “longitudinal strength” of an EHR dataset

• Establishing “baseline” incidence rates of liver signals for a few real-world disease populations and examining their changes with prescriptions

Page 21: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

21

Oct 19, 2010

Two-way translation b/w clinical research and practice

Discovery Preclinical Development

Early Clinical Development

Late Clinical Development

Product LCM

Mining of real world data• CER (including comparative safety)• HTA• PHC• Pharmacovigilance

Page 22: Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 Oct 19, 2010 Zhaohui (John) Cai, MD, PhD Director, Biomedical Informatics AstraZeneca

Proprietary and Confidential © AstraZeneca 2009

FOR INTERNAL USE ONLY

22

Oct 19, 2010

Acknowledgements

• Anders Ottosson, Patient Safety, AstraZeneca

• Kaushal Desai, Biomedical Informatics, AstraZeneca

• James Weatherall, Biomedical Informatics, AstraZeneca

Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 15 Stanhope Gate, W1K 1LN, London, UK, Tel: +44(0)20 7304 5000, Fax: +44 (0)20 7304 5151, www.astrazeneca.com