data intensive science and the transformation of knowledge · the beginning of the fourth paradigm...
TRANSCRIPT
Data Intensive Science and the Transformation of Knowledge
ISMPP Conference April 30, 2013
Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare
@CarolMcCall
The Coming Era of Value-Based Healthcare
Reform is creating the most comprehensive set of changes in US healthcare since Medicare in 1965
- Re-design healthcare to reward value over volume and outcomes over activity
- Driving unprecedented innovation
- Creating entirely new notions of value
- A fundamental shift – indeed, a new paradigm – whose scope cannot be overstated
Reform is Tough Medicine
Bringing unprecedented challenges - Exposing fundamental gaps in capabilities and knowledge
- Threatening long-standing areas of competitive advantages
- Re-writing underlying business models
- Shifting the balance of power and creating entirely new players
- Holds the potential to redraw the entire competitive landscape
Need to aggressively adapt or risk long-term viability
“In ten years, the pharma industry will be paid on outcomes and we have no idea how to get there”
– CEO, Pharmaceutical Company
• Crushing economics that threaten our entire economy - $2.7T annually (~18% of GDP and growing)
- 30+% of care doesn’t create better outcomes
- Entering the ‘boomer wave’ (8k people per day turn age 65)
On the Eve of Crisis USA Inc.
A Wanamaker Problem We lack the detailed evidence we need for value-based healthcare
An Example of Staggering Differences
"We are not seeing dramatically higher survival rates at any age in the U.S., notwithstanding much greater expenditures.” “The US has no well-defined strategy for how to deal with this and that often leads to a lot of unnecessary care.”
U.S. health care costs for the aged are sky high December 13, 2009
By Mark Roth / Pittsburgh Post-Gazette
It's a startling graph.
* Similar per capita expenditures as Germany or UK would reduce total US healthcare costs by 40%
Annual Per Capita Costs by Age in Different Countries
Life Expectancy and Costs in Different Countries
Mistakes in Scientific Studies Surge WSJ August, 2011
When a study is retracted, it can be hard to make its effects go away. In a sign of the times, a blog called "Retraction Watch" has popped up to monitor the flow Theories suggested on why the backpedaling? • Journals better at detecting errors • Easier to uncover plagiarism • Competition / temptation for fraud
In the Race for Evidence, Knowing Things is Hard Retractions are on the rise
In the Race for Evidence, Knowing Things is Hard We Turn Out to Be Just Plain Wrong
Two recent studies analyzed landmark research on clinical effectiveness Only ~50% have stood the test of time Remainder of them have been • Reversed outright • Supported, but to a lesser degree • Inconclusive (or still unchallenged)
1. Prasad V, Gall V, Cifu A. The Frequency of Medical Reversal. Arch Intern Med. 2011;171(18):1675-1676. 2. Ioannidis JP. Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. JAMA. 2005;294(2):218-228.
Studies of Studies Show We Get Things Wrong The Guardian, July 2011
“Half of what you’ll learn in medical school
will be shown to be either dead wrong or out
of date within five years of graduation.”
Dr. David Sackett
These findings suggest that • There's NEVER an excuse to stop
monitoring outcomes
• Such medical reversals, if we pursued them, could be common
To do that, we need to: • Create ways to find what we’re NOT
actually looking for
• Get better at Being Wrong
Mark Twain was Right It ain't what you don't know that gets you into trouble.
It's what you know ‘for sure’ that just ain't so.
- Mark Twain
Preparing for Surprise
A fascinating tour of human fallibility and a new way of looking at wrongness Schulz sees our capacity to err as inseparable from our imagination She links error to human creativity, and in particular, to how we generate and revise our beliefs about the world With new ways to do this, we can get better at Being Wrong and just perhaps, unleash our creativity in healthcare
Can Big Data fix healthcare? Is our system too broken, or does it need something different? Can it help us find what we weren’t looking for?
The New Gold Rush: Big Data
• Massive data generation
• Maturing technologies and plummeting costs
• Hot topic in business
• Called The Next Frontier for innovation and competition
The New Gold Rush: Big Data Big Data is a ‘Hot Topic’
“Data generation and storage is no longer the issue. The bottleneck is now the analytics to turn healthcare data into actionable knowledge to match health interventions to patients.”
- Participant @ Strata Rx conference October 17, 2012
A Cautionary Note: Correlation vs. Causation Getting it wrong…
Correlation: Answers the question ‘What happens when I see?
• Traditional statistics as well as data mining & pattern matching fall in this category
• Valuable for many things, but can be misleading
Causation: Answers the question ‘What happens when I do?
• Healthcare demands we know causation (i.e. actions, events or processes that bring about specific effects)
• Predominantly established through RCTs
‘Overweight In Dogs Related To Overweight Owners’ Public Health Nutrition; June 2009
“…these [lead] to a third change: a move away from the age-old search for causality”
“There is a treasure hunt underway, driven by insights to be extracted [and] the dormant value that can be unleashed by a shift from causation to correlation.”
A Cautionary Note in the Gold Rush
A Cautionary Note in the Gold Rush
• Only incidentally about potential side effects of a treatment
• Real target was the observational study - and whether it could be trusted
• Issue is of paramount importance - Becoming more common
- Fast-becoming a toweringly important type of investigation
• Big Data actually makes spurious correlations more common (not less)
Judea Pearl 2011 Turing Award
Dr. Pearl was recently awarded for his body of work to develop and synthesize two branches of calculus
# 1: The de facto standard for reasoning under uncertainty (used everywhere, from voice recognition to self-driving cars)
# 2: A calculus for determining cause-and-effect relationships directly from data
- A mathematical language for expressing concepts explicitly
- Precision and computational benefits of a formal logic
- Ability to transfer knowledge reliably (and computationally)
A New Paradigm in Analytics Re-inventing the Science of Evidence
Rapid-learning directly from data
Number theory & RSA encryption algorithms Causal mathematics
World Wide Web
A New Paradigm in Analytics Re-inventing the Science of Evidence
Hypothesis-free discovery of cause-and-effect relationships
directly and at scale from observational data
GNS Healthcare
Big Data Causal
Mathematics Machine Learning Models
An Example of Discovery @ Scale Planning for Surprise
Innovative Healthcare Company The Setting • National research reputation, a portfolio of publications and rich data assets
• Recently published on an important drug-drug interaction
Expand Their Ability to Discover Important Results The Goal • Frustrated by time required; concerned about questions they weren’t asking
• Test GNS approach – Reproduce their finding and explore evidence of other (unasked) impacts
3 Years of Detailed Claims Data The Data • Details with ICD-9, CPT-4 and NDC codes
• Patients relevant to their earlier finding
Reproduce Their Finding (while blindfolded) GNS Challenge • Identify causal links between drugs and outcomes
• Data completely blinded (all codes were dummies)
Big Data?
# Patients 111,641
# Transaction Records 58,181,059
# Diagnosis Codes 12,241
# Procedure Codes 11,174
# Drug Codes (NDC level) 24,447
Big Data!
# Patients 111,641
# Transaction Records 58,181,059
# Diagnosis Codes 12,241
# Procedure Codes 11,174
# Drug Codes (NDC level) 24,447
# Hypotheses with Biasing Driver Variables
44,690,959,998,504,000
~45 quadrillion hypotheses
A Penny for Your Hypothesis…
The Hypothesis Space You need 44 more of these…
1 quadrillion pennies
Challenges
The Approach • Exhaustive search of hypotheses
• Modeled time-ordering & interplay of events and exposures
• Automatically identified causal drivers and adjusted for bias
• Preserved uncertainty (probabilistic causality)
• Distributed computational load for fast results (hours)
24
• Reduced the space to the meaningful few
• Reproduced the finding
• Found things we weren’t looking for, including a notable surprise: – Possible adverse effect for a commonly prescribed drug
– Initially replicated in (2) out-of-sample datasets
– Pursuing additional validation (no blindfolds this time)
25
Adverse Effects Beneficial Effects
# Total Hypotheses 44,690,959,998,504,000
# Detected Correlations* 31,481,043 42,471,231
# Detected Causal Relationships* 248 151
The Results
* Statistically significant at p=.05
Causal Relationships
Correlations
Hypotheses (45x)
The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record
The scientific record • Communicates findings • Organizes and collects related works • Documents and manages controversies • Establishes precedence • Ensures confidence and trust • Supports reproducibility
How does this change as we enter data-intensive discovery?
The Evolution of the Scientific Record Today (the 3rd Paradigm)
Much more complicated and technology-mediated
- Data is no longer fully documented, only summarized
- The link between evidence and writings is more complex
- Computation (and software) integral to reproducibility
- Reproducibility itself extends beyond data access and understanding methods
- Literature has become huge (tools to handle sheer scale)
- Affordances of a scientific record based on print and physical artifacts offer small relief
The Paradigm of Data-Intensive Science
We Have Reached a Janus Moment
Janus Moments – the moment where a new norm is established. They may be planned or unplanned, predictable or not, good or bad; but they effect what is considered “normal” in society, technology, economics, and politics – on a personal and macro level
“With the arrival of the data-intensive computing paradigm, the scientific record and the supporting system of communication and publication has
reach a Janus moment, where we are both looking forward and backward.” – Clifford Lynch
Data and Software
•First-class objects
•Need systematic management and curation in their own right
Scientific Journals
•Bid (slow) farewell to storage & delivery that are essentially images of the printed page
•Papers will become computational windows to actively understand, reproduce and extend results
Reference Data
•Collections will become an integral part (computed upon rather than read)
•When updated, will trigger new computations, lead to new or reassessed results
Scientific Record
•Will become a major object of ongoing computation itself – THE central reference collection
The Scientific Record in the Fourth Paradigm
In the Small
• Go beyond the paper, with computational tools that engage underlying science and data
• Move between papers and reference data with great ease and flexibility
• Integrate with collaborative environments with tools for annotation, authoring, simulation, and analysis
In the Large
• As a large corpus of text and interlinked data resources using a wide range of computational tools
• Will identify relevant papers of interest, suggest hypotheses that can be tested elsewhere, or allow production of new data or results
Engaging the Data-Intensive Scientific Record
Data-intensive science will ultimately transform both scientific culture and publishing practice, including
- Views on open access
- Applications of markups and choice of authoring tools
- Disciplinary norms about data curation, data sharing and overall data lifecycle
I urge you to take on the mantle of stewardship for helping make data-intensive science a reality
Implications for Publication
“In the practice of data-intensive science, one set of data will, over time, figure prominently, persistently, and ubiquitously in
scientific work: the scientific record itself”
– Clifford Lynch
Thank you
Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare
@CarolMcCall