data intensive science and the transformation of knowledge · the beginning of the fourth paradigm...

32
Data Intensive Science and the Transformation of Knowledge ISMPP Conference April 30, 2013 Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare @CarolMcCall

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Data Intensive Science and the Transformation of Knowledge

ISMPP Conference April 30, 2013

Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare

@CarolMcCall

Page 2: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

The Coming Era of Value-Based Healthcare

Reform is creating the most comprehensive set of changes in US healthcare since Medicare in 1965

- Re-design healthcare to reward value over volume and outcomes over activity

- Driving unprecedented innovation

- Creating entirely new notions of value

- A fundamental shift – indeed, a new paradigm – whose scope cannot be overstated

Page 3: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Reform is Tough Medicine

Bringing unprecedented challenges - Exposing fundamental gaps in capabilities and knowledge

- Threatening long-standing areas of competitive advantages

- Re-writing underlying business models

- Shifting the balance of power and creating entirely new players

- Holds the potential to redraw the entire competitive landscape

Need to aggressively adapt or risk long-term viability

“In ten years, the pharma industry will be paid on outcomes and we have no idea how to get there”

– CEO, Pharmaceutical Company

Page 4: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

• Crushing economics that threaten our entire economy - $2.7T annually (~18% of GDP and growing)

- 30+% of care doesn’t create better outcomes

- Entering the ‘boomer wave’ (8k people per day turn age 65)

On the Eve of Crisis USA Inc.

A Wanamaker Problem We lack the detailed evidence we need for value-based healthcare

Page 5: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

An Example of Staggering Differences

"We are not seeing dramatically higher survival rates at any age in the U.S., notwithstanding much greater expenditures.” “The US has no well-defined strategy for how to deal with this and that often leads to a lot of unnecessary care.”

U.S. health care costs for the aged are sky high December 13, 2009

By Mark Roth / Pittsburgh Post-Gazette

It's a startling graph.

* Similar per capita expenditures as Germany or UK would reduce total US healthcare costs by 40%

Annual Per Capita Costs by Age in Different Countries

Life Expectancy and Costs in Different Countries

Page 6: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Mistakes in Scientific Studies Surge WSJ August, 2011

When a study is retracted, it can be hard to make its effects go away. In a sign of the times, a blog called "Retraction Watch" has popped up to monitor the flow Theories suggested on why the backpedaling? • Journals better at detecting errors • Easier to uncover plagiarism • Competition / temptation for fraud

In the Race for Evidence, Knowing Things is Hard Retractions are on the rise

Page 7: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

In the Race for Evidence, Knowing Things is Hard We Turn Out to Be Just Plain Wrong

Two recent studies analyzed landmark research on clinical effectiveness Only ~50% have stood the test of time Remainder of them have been • Reversed outright • Supported, but to a lesser degree • Inconclusive (or still unchallenged)

1. Prasad V, Gall V, Cifu A. The Frequency of Medical Reversal. Arch Intern Med. 2011;171(18):1675-1676. 2. Ioannidis JP. Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. JAMA. 2005;294(2):218-228.

Studies of Studies Show We Get Things Wrong The Guardian, July 2011

“Half of what you’ll learn in medical school

will be shown to be either dead wrong or out

of date within five years of graduation.”

Dr. David Sackett

Page 8: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

These findings suggest that • There's NEVER an excuse to stop

monitoring outcomes

• Such medical reversals, if we pursued them, could be common

To do that, we need to: • Create ways to find what we’re NOT

actually looking for

• Get better at Being Wrong

Mark Twain was Right It ain't what you don't know that gets you into trouble.

It's what you know ‘for sure’ that just ain't so.

- Mark Twain

Page 9: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Preparing for Surprise

A fascinating tour of human fallibility and a new way of looking at wrongness Schulz sees our capacity to err as inseparable from our imagination She links error to human creativity, and in particular, to how we generate and revise our beliefs about the world With new ways to do this, we can get better at Being Wrong and just perhaps, unleash our creativity in healthcare

Page 10: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Can Big Data fix healthcare? Is our system too broken, or does it need something different? Can it help us find what we weren’t looking for?

The New Gold Rush: Big Data

Page 11: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

• Massive data generation

• Maturing technologies and plummeting costs

• Hot topic in business

• Called The Next Frontier for innovation and competition

The New Gold Rush: Big Data Big Data is a ‘Hot Topic’

Page 12: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

“Data generation and storage is no longer the issue. The bottleneck is now the analytics to turn healthcare data into actionable knowledge to match health interventions to patients.”

- Participant @ Strata Rx conference October 17, 2012

Page 13: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

A Cautionary Note: Correlation vs. Causation Getting it wrong…

Correlation: Answers the question ‘What happens when I see?

• Traditional statistics as well as data mining & pattern matching fall in this category

• Valuable for many things, but can be misleading

Causation: Answers the question ‘What happens when I do?

• Healthcare demands we know causation (i.e. actions, events or processes that bring about specific effects)

• Predominantly established through RCTs

‘Overweight In Dogs Related To Overweight Owners’ Public Health Nutrition; June 2009

Page 14: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

“…these [lead] to a third change: a move away from the age-old search for causality”

“There is a treasure hunt underway, driven by insights to be extracted [and] the dormant value that can be unleashed by a shift from causation to correlation.”

A Cautionary Note in the Gold Rush

Page 15: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

A Cautionary Note in the Gold Rush

• Only incidentally about potential side effects of a treatment

• Real target was the observational study - and whether it could be trusted

• Issue is of paramount importance - Becoming more common

- Fast-becoming a toweringly important type of investigation

• Big Data actually makes spurious correlations more common (not less)

Page 16: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Judea Pearl 2011 Turing Award

Dr. Pearl was recently awarded for his body of work to develop and synthesize two branches of calculus

# 1: The de facto standard for reasoning under uncertainty (used everywhere, from voice recognition to self-driving cars)

# 2: A calculus for determining cause-and-effect relationships directly from data

- A mathematical language for expressing concepts explicitly

- Precision and computational benefits of a formal logic

- Ability to transfer knowledge reliably (and computationally)

A New Paradigm in Analytics Re-inventing the Science of Evidence

Page 17: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Rapid-learning directly from data

Number theory & RSA encryption algorithms Causal mathematics

World Wide Web

A New Paradigm in Analytics Re-inventing the Science of Evidence

Page 18: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Hypothesis-free discovery of cause-and-effect relationships

directly and at scale from observational data

GNS Healthcare

Big Data Causal

Mathematics Machine Learning Models

Page 19: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

An Example of Discovery @ Scale Planning for Surprise

Innovative Healthcare Company The Setting • National research reputation, a portfolio of publications and rich data assets

• Recently published on an important drug-drug interaction

Expand Their Ability to Discover Important Results The Goal • Frustrated by time required; concerned about questions they weren’t asking

• Test GNS approach – Reproduce their finding and explore evidence of other (unasked) impacts

3 Years of Detailed Claims Data The Data • Details with ICD-9, CPT-4 and NDC codes

• Patients relevant to their earlier finding

Reproduce Their Finding (while blindfolded) GNS Challenge • Identify causal links between drugs and outcomes

• Data completely blinded (all codes were dummies)

Page 20: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Big Data?

# Patients 111,641

# Transaction Records 58,181,059

# Diagnosis Codes 12,241

# Procedure Codes 11,174

# Drug Codes (NDC level) 24,447

Page 21: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Big Data!

# Patients 111,641

# Transaction Records 58,181,059

# Diagnosis Codes 12,241

# Procedure Codes 11,174

# Drug Codes (NDC level) 24,447

# Hypotheses with Biasing Driver Variables

44,690,959,998,504,000

~45 quadrillion hypotheses

Page 22: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

A Penny for Your Hypothesis…

Page 23: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

The Hypothesis Space You need 44 more of these…

1 quadrillion pennies

Page 24: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Challenges

The Approach • Exhaustive search of hypotheses

• Modeled time-ordering & interplay of events and exposures

• Automatically identified causal drivers and adjusted for bias

• Preserved uncertainty (probabilistic causality)

• Distributed computational load for fast results (hours)

24

Page 25: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

• Reduced the space to the meaningful few

• Reproduced the finding

• Found things we weren’t looking for, including a notable surprise: – Possible adverse effect for a commonly prescribed drug

– Initially replicated in (2) out-of-sample datasets

– Pursuing additional validation (no blindfolds this time)

25

Adverse Effects Beneficial Effects

# Total Hypotheses 44,690,959,998,504,000

# Detected Correlations* 31,481,043 42,471,231

# Detected Causal Relationships* 248 151

The Results

* Statistically significant at p=.05

Causal Relationships

Correlations

Hypotheses (45x)

Page 26: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record

The scientific record • Communicates findings • Organizes and collects related works • Documents and manages controversies • Establishes precedence • Ensures confidence and trust • Supports reproducibility

How does this change as we enter data-intensive discovery?

Page 27: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

The Evolution of the Scientific Record Today (the 3rd Paradigm)

Much more complicated and technology-mediated

- Data is no longer fully documented, only summarized

- The link between evidence and writings is more complex

- Computation (and software) integral to reproducibility

- Reproducibility itself extends beyond data access and understanding methods

- Literature has become huge (tools to handle sheer scale)

- Affordances of a scientific record based on print and physical artifacts offer small relief

Page 28: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

The Paradigm of Data-Intensive Science

We Have Reached a Janus Moment

Janus Moments – the moment where a new norm is established. They may be planned or unplanned, predictable or not, good or bad; but they effect what is considered “normal” in society, technology, economics, and politics – on a personal and macro level

“With the arrival of the data-intensive computing paradigm, the scientific record and the supporting system of communication and publication has

reach a Janus moment, where we are both looking forward and backward.” – Clifford Lynch

Page 29: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Data and Software

•First-class objects

•Need systematic management and curation in their own right

Scientific Journals

•Bid (slow) farewell to storage & delivery that are essentially images of the printed page

•Papers will become computational windows to actively understand, reproduce and extend results

Reference Data

•Collections will become an integral part (computed upon rather than read)

•When updated, will trigger new computations, lead to new or reassessed results

Scientific Record

•Will become a major object of ongoing computation itself – THE central reference collection

The Scientific Record in the Fourth Paradigm

Page 30: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

In the Small

• Go beyond the paper, with computational tools that engage underlying science and data

• Move between papers and reference data with great ease and flexibility

• Integrate with collaborative environments with tools for annotation, authoring, simulation, and analysis

In the Large

• As a large corpus of text and interlinked data resources using a wide range of computational tools

• Will identify relevant papers of interest, suggest hypotheses that can be tested elsewhere, or allow production of new data or results

Engaging the Data-Intensive Scientific Record

Page 31: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Data-intensive science will ultimately transform both scientific culture and publishing practice, including

- Views on open access

- Applications of markups and choice of authoring tools

- Disciplinary norms about data curation, data sharing and overall data lifecycle

I urge you to take on the mantle of stewardship for helping make data-intensive science a reality

Implications for Publication

“In the practice of data-intensive science, one set of data will, over time, figure prominently, persistently, and ubiquitously in

scientific work: the scientific record itself”

– Clifford Lynch

Page 32: Data Intensive Science and the Transformation of Knowledge · The Beginning of the Fourth Paradigm Data-Intensive Science and the Scientific Record The scientific record • Communicates

Thank you

Carol J. McCall, FSA, MAAA Chief Strategy Officer, GNS Healthcare

@CarolMcCall