tal zarsky, "correlation v. causation in health-related big data analysis: the role of reason...

17

Category:

Healthcare


2 download

TRANSCRIPT

Background

“Just Correlation” and predictive analytics in the medical and other contexts: The Age of Big DataData Driven Processes and ResultsPutting the information to useReliance on “mere” correlations

Roadmap

The rise of “Big Health Data” What does mere reliance on correlation mean (examples) Possible options, alternatives and outcomes

Pros and Cons of “Just Causation” Reliance on other disciplines.

Law and Policy implications and “hooks”

“Big Health Data” Health and Medical data held by new players, because of: Definition change New practices, sources and business models. At times, these are startups.

Change reflected in some new legislation [GDPR in the EU]. Regulating health data calls for unique balancing;; Strong privacy preference vs. public benefits

Example (1): Credit Data “all data is credit data, we just don’t know how to use it yet”.

ZestFinance and others – provide methods for credit ranking of the “underbanked”.

Most likely rely on correlations between attributes, factors and behaviors – and rates of payment or default.

These insights are used for prospective credit applicants.

Example (2) Health Data & IoT

Wearables -­ gadgets affixed to the body which collect biometric and behavioral data. Fitbit products provided to employees (for free!).

Possible future uses – calculating insurance premiums. Similar processes carried out by smartphoneapplications.

Again, firms rely on “mere” correlations found in the data when making health-­related recommendation and judgments.

What Do We Mean by “Just Correlation” Five possible variations of Big Data uses – relying upon:

1. Mere Correlations2. Correlation + Statistical proof of causation. 3. Correlation + Experimental evidence of causation

(natural or artificial manipulation). 4. Correlation + reasonable mechanism hypothesis5. Correlation + scientifically proven mechanism found.

“Mechanism” – term of art;; an explanation of a phenomenon. • Provides additional proof as to the existence of a

causal relationship• Provides scientific knowledge.

“Just Correlation” – What Can Go Wrong? Possible outcomes when a Correlation between Factor “A” and “B” was found:

(i) A (indeed) causes B(ii) A does not cause B. The data is wrong. (iii) A does not cause B. The correlation is spurious.

(iv) A does not cause B. B causes A.(v) A does not cause B. C causes both A and B.

The Benefits of “Just Correlation”

1. The need for speed.2. Low costs.3. Does not compromise precision. 4. Does not steer science towards existing knowledge and theory-­ Limited bias against unexplainable findings.

Just Correlation: Problems (1)

Causation as a “Quality Check”: Assists in the removal of noise. Protects us from “over-­fitting” Do we need a “mechanism”, or does statistical causation suffice? Mechanisms assist in revealing confounders.

Having a theory enables generalization of findings.

Just Correlation: Problems (2)

Understanding mechanisms alerts us of possible side effects. Important factor in the health context.

Seeking mechanisms leads to positive externalities – knowledge about nature and society.

In Conclusion: Causation provides important benefits and is essential in the health context. A context-­specific analysis is required to establish whether mechanisms are always mandated.

Legal Hooks and Responses Law should not intervene, because:

Market still self-­correct if mere correlation is error-­ridden (but…).

Intervention might undermine innovation. Law should not meddle with science – it might serve self interests, or get things wrong.

But… Different rules should be applied when government is the source of data – could require or restrict uses.

Specific interventions might be called for to protect the interests of investors, data subjects and those affected by the process.

Investors

Protect investors from the executive’s reckless conduct – mere reliance on correlation.

But, Investors should look after their own interests. Assure disclosure pertaining to this specific matter.

Data Subjects Prediction often involves personal data

Compromises privacy rights and involves balancing.

Possible questions: Was the data de-­identified? Was consent provided? Should processing be allowed even without consent?

The privacy balance should consideroverall benefits – and these requirecausation. This balance will impact the legal findings as towhether data usage should be permitted.

Impacted Individuals (1) Correlations lead, at times, to negative treatment. With health data, secondary effects might also follow (such as stigma).

Can those negatively impacted by a “mere” correlation bring action against a firm? Are such actions and outcomes “unfair”? If a prediction proves wrong, equality is compromised.

Equals are not treated equally (FTC report). However, private firms are not necessarily subjected to such a fairness requirement. Protected groups might not be implicated. Mitigation via competition (over time).

Impacted Individuals (2)

When might the fear of unfair outcomes render “just correlation” – unjust? Government (higher fairness standard) And also highly regulated industries… “Socially meaningful” industries

Health-­care, insurance, credit.

Monopoly (no mitigating competition) In sum: the higher standard would often apply in the health and medical context.

Thank you!

Comments are welcome: [email protected]