a formal representation for numerical data presented in published clinical trial reports
DESCRIPTION
A Formal Representation for Numerical Data Presented in Published Clinical Trial Reports. Maurine Tong BS, William Hsu PhD, Ricky K Taira PhD Medical Imaging Informatics Group University of California, Los Angeles. Problem: Querying Free Text CTRs. Clinical Trial Reports (CTRs). - PowerPoint PPT PresentationTRANSCRIPT
UCLA
MII
A Formal Representation for Numerical Data Presented in
Published Clinical Trial Reports
Maurine Tong BS, William Hsu PhD, Ricky K Taira PhD
Medical Imaging Informatics Group
University of California, Los Angeles
UCLA
MII
Problem: Querying Free Text CTRs
Clinical Trial Reports (CTRs)
Patient Recruitment
Internal/External Validity Testing
Disease Modeling
QueryProcessor
Informatics Applications
Representation
UCLA
MII
Why Focus on Numerical Info
• Predictive disease modeling• Ex: Bayesian Belief Networks
• Key to identifying trial quality• Hypothesis testing context and measures
• Key to synthesizing evidence• What is the context for reported probabilities
•P ( effect | cause, context )
Internal Validity
Disease Modeling
Patient Recruitment
UCLA
MII
Background and Prior Work• Ontologies for Experiments and Clinical Trials
• Ontology of Clinical Research (OCRe) Sim et al.• Ontology of Scientific Experiments (EXPO) Soldatova et al.
• Standardizing and sharing clinical trial data• BRIDG, CDISC, SNOMED CT
• Representing individual sections of a clinical trial report• Eligibility criteria: EliXR, Weng et al. • Scientific claims: Blake et al.
These systems primarily help to improve patient recruitment. Our focus is on modeling numerical information for quality
assessment and disease modeling
UCLA
MII
Problem: Fragmentation
UCLA
MII
Methods: Requirements Analysis• What are the queries to be supported by the representation?
Study Quality Disease Modeling
UCLA
MII
Methods: Requirements Analysis• Study quality queries
• What is the p-value (population parameter associated with hypothesis?
• What is the statistical test used to calculate the p-value?
• What is the power of the sample size tested?
• …
Study Quality
and experts
James Sayre, PhDBiostatician
Consulted textbooks
UCLA
MII
Methods: Requirements Analysis• Disease modeling queries• What are the prior
probabilities?• Can we estimate posterior
probabilities from p-values or other reported information?
• …
Disease Modeling
Consulted experts, textbooks and literature
Thomas Belin, PhDBiostatician
UCLA
MII
Methods: Initial Design
• Conceptual model of representation
• Domain: Metastatic Melanoma
Flaherty KT. et al. N Engl J Med. 2010 Aug 26;363(9):809-19
UCLA
MII
Pop. Stats
Sample Pop. Intervention Baseline Measurements
Variables <240m
g240mg
320 / 360mg
720 mg<24mg 240mg320 /
360mg720 mg
Prevalence of MAP kinase pathway mutation
40-60%
Age 23-86
Confirmed histology refractory to standard treatment
0:5,1:16,
2:5, >2:23
PLX4032 Formulation Crystalline n=3/6 n=3/6 n=3/6 n=3/6 Microprecipitated bulk powder
n=34 n=34 n=34 n=34
Plasma samples (uM x hr)
100 +/-
50350+/-
78650+/-
1001500+/-1000
<240mg
240mg
CT Studies Total Response Rate 100% 34% 67% 80%Partial Response 02,6 02,4
…
…
…
…
…
UCLA
MII
Pop. Stats
Sample Pop. Intervention Baseline Measurements
Variables <240m
g240mg
320 / 360mg
720 mg<24mg 240mg320 /
360mg720 mg
Prevalence of MAP kinase pathway mutation
40-60%
Age 23-86
Confirmed histology refractory to standard treatment
0:5,1:16,
2:5, >2:23
PLX4032 Formulation Crystalline n=3/6 n=3/6 n=3/6 n=3/6 Microprecipitated bulk powder
n=34 n=34 n=34 n=34
Plasma samples (uM x hr)
100 +/-
50350+/-
78650+/-
1001500+/-1000
<240mg
240mg
CT Studies Total Response Rate 67% 80%Partial Response 02,6 02,4
…
…
…
…
…
A
Process Model
UCLA
MII
Pop. Stats
Sample Pop. Intervention Baseline Measurements
Variables <240m
g240mg
320 / 360mg
720 mg<24mg 240mg320 /
360mg720 mg
Prevalence of MAP kinase pathway mutation
40-60%
Age 23-86
Confirmed histology refractory to standard treatment
0:5,1:16,
2:5, >2:23
PLX4032 Formulation Crystalline n=3/6 n=3/6 n=3/6 n=3/6 Microprecipitated bulk powder
n=34 n=34 n=34 n=34
Plasma samples (uM x hr)
100 +/-
50350+/-
78650+/-
1001500+/-1000
<240mg
240mg
CT Studies Total Response Rate 67% 80%Partial Response 02,6 02,4
…
…
…
…
…
B Global Variable List
UCLA
MII
Pop. Stats
Sample Pop. Intervention Baseline Measurements
Variables <240m
g240mg
320 / 360mg
720 mg<24mg 240mg320 /
360mg720 mg
Prevalence of MAP kinase pathway mutation
40-60%
Age 23-86
Confirmed histology refractory to standard treatment
0:5,1:16,
2:5, >2:23
PLX4032 Formulation Crystalline n=3/6 n=3/6 n=3/6 n=3/6 Microprecipitated bulk powder
n=34 n=34 n=34 n=34
Plasma samples (uM x hr)
100 +/-
50350+/-
78650+/-
1001500+/-1000
<240mg
240mg
CT Studies Total Response Rate 67% 80%Partial Response 02,6 02,4
…
…
…
…
…
C Variable Characterization
UCLA
MII
Pop. Stats
Sample Pop. Intervention Baseline Measurements
Variables <240m
g240mg
320 / 360mg
720 mg<24mg 240mg320 /
360mg720 mg
Prevalence of MAP kinase pathway mutation
40-60%
Age 23-86
Confirmed histology refractory to standard treatment
0:5,1:16,
2:5, >2:23
PLX4032 Formulation Crystalline n=3/6 n=3/6 n=3/6 n=3/6 Microprecipitated bulk powder
n=34 n=34 n=34 n=34
Plasma samples (uM x hr)
100 +/-
50350+/-
78650+/-
1001500+/-1000
<240mg
240mg
CT Studies Total Response Rate 67% 80%Partial Response 02,6 02,4
…
…
…
…
…D Statistical Hypothesis
Testing
UCLA
MII
Results: Implementation
UCLA
MII
Example 1: Capturing context
• Demonstration of how the representation captures context for the observations of an intervention group.
• Query• Domain: Lung Cancer • In Johnson et al., what is the context (e.g.,
intervention, population characteristics, measurement methodology) associated with progression free survival (PFS) in the high dose group (HDG)?
Johnson DH. et al. J Clin Oncol. 2004 Jun 1;22(11):2184-91.
UCLA
MII
Steps to Capture Context
1. Find the node in the process model
2. Find corresponding column
3. Find variable of interest
4. Backtrack through the process model to obtain context for observations and get associated data to backtracked node
5. Construct logical representation of context
6. Repeat steps 4-5 until the start node
UCLA
MII
Step 1: Find the node in process model
This node represents the progression free survival time point for high dose group.
UCLA
MII
Step 2: Find corresponding column
This column represents the numerical data and data elements associated with this node
UCLA
MII
Step 3: Find variable of interest
UCLA
MII
Step 4: Backtrack & Obtain Data
Obtain context by looking at linked nodes in process model
UCLA
MII
Step 5: Construct logical context
Data modeling is straightforward from semantics of process model link and node
Cell name: BevacizumabCell Location #: 474
Drug: BevacizumabDose: 15 mg/kg
How was it administered:
Vehicle: Intravenous infusion Duration: Over 90 minutes Cycle: 3 weeks Maximum dose: 18 doses Exception: Well tolerated Resulting Action: New duration Duration: 30-60 minutes
UCLA
MII
Step 6: Repeat steps 4-5 until start• Continue backtracking through process model
• Aggregate associated data
• Repeat until first node
Context for Adverse Event (Node #740):• Name of n847
UCLA
MII
Example 1: Capturing context
• Demonstration of how the representation captures context for the observations of an intervention group.
• Query• What is the context (e.g., intervention, population
characteristics, measurement methodology) associated with progression free survival (PFS) in the high dose group?
UCLA
MII
Example 1: Capturing context
• Data:
• AssociatedContext:
Context for Adverse Event (Node #740):1 ) INTERVENTION:
Bevacizumab (Node #474)2) POPULATION CHARACTERISTICS:
High Dose Bev (Arm #3)Eligibility Criteria: Stage 3 Recurrent NSCLC (Node #847) No Prior Chemotherapy (Node #628) Other criteria (Node #748)Baseline characteristics of the patient (Node #222)
3) METHODS:Progression Free Survival
UCLA
MII
Example 2: Comparisons
• Comparison of outcomes in the intervention vs. control arms
• Query• Compare PFS for intervention and
control arm
• Context from two nodes can be placed on the same chart
UCLA
MII
Example 3: Analyses
• How was the p-value calculated?
• Visualization includes:• Data• Test Statistics• P-value• Statement
UCLA
MII
Pilot Evaluation
• Can representation answer user queries from requirements analysis?
• Preliminary evaluation questions• Characteristics of the trial• Quality of the trial• Significance of the science
UCLA
MII
Evaluation: Objectives
• Objective 1• Utility of the representation to accurately identify
numerical data to support key contributions made by a clinical trial report
• Objective 2• Intuitiveness of the representation through
reproducibility of the visualization by different users
UCLA
MII
Evaluation: Study Design• Study design
• 2-arm study
• Status quo group using paper copy
• Intervention group using proposed representation
• Participants (n=6)• Graduate students in biology, biostatistics, informatics, or engineering
• Statistical methods• Student’s paired t-test
• Gold standard
• Established by graduate student supervised by domain expert
• 4 clinical trial papers in NSCLC
• J Clin Oncol. 2004 Jun 1;22(11):2184-91.
• J Clin Oncol. 2008 May 20;26(15):2442-9.
• Lancet Oncol. 2012 Jan;13(1):33-42.
• J Clin Oncol. 2011 Nov 1;29(31):4113-20.
UCLA
MII
Evaluation: Questions
• What is the purpose of this trial?
• What is the sample size for each experimental arm?
• How was the primary outcome assessed?
• How many patients experienced positive outcomes in this trial?
• How was the data analyzed?
UCLA
MII
Evaluation: Results• Users of the representation was able to accurately
identify numerical data that support key
contributions as compared with status quo
• User visualizations was reproducible• 68.1% ± 6.45% was of the gold standard was
reproduced by users
Accuracy
SD Time SD
Representation 79% 18% 30 9%
Status Quo 76% 9% 34 7%
UCLA
MII
Discussion• Our work supports queries related to study quality
and disease modeling
• We developed a representation to associate appropriate
context from numerical data within clinical trial reports
• The pilot evaluation shows that the utility of the
representation is promising
• To extend this work:
• Instantiate using automatic methods and capture numerical
data using NLP methods
• Develop an interface to support frequently-asked queries
for specific clinical trial reports
• Test in journal club setting
UCLA
MII
Conclusion
• We are establishing a systematic way of extracting information from clinical trial reports in a machine-understandable way
• The overarching objective is to have a computer reason on this representation to facilitate clinical decision making
UCLA
MII
Acknowledgements
• James Sayre, PhD, Biostatician• Domain experts• Research participants• NLM Training Grant• NLM R01-LM009961
UCLA
MII
THANK YOU