vwhpv 'dwd 6flhqfh lq +hdowk

15
PHPM631 (Kum) 1/13/2020 Health Information Management Systems Data Science in Health Hye-Chung Kum ([email protected]) Associate Professor Population Informatics Lab (https://pinformatics.org/) Course URL: http://pinformatics.org/phpm631 License: Health Information Technology by Hye-Chung Kum is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License 1 Outline: Health Care Data Use Operational vs. Decision Support Systems What is Data Science/Business Intelligence o What is Data Science? o What is Big Data? o Overview of Data Mining Understanding Data 2 1 2

Upload: others

Post on 26-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Health Information Management SystemsData Science in Health

Hye-Chung Kum ([email protected])

Associate ProfessorPopulation Informatics Lab (https://pinformatics.org/)

Course URL: http://pinformatics.org/phpm631

License:Health Information Technology by Hye-Chung Kum is licensed under a

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

1

Outline: Health Care Data Use

Operational vs. Decision Support Systems

What is Data Science/Business Intelligence

o What is Data Science?

o What is Big Data?

o Overview of Data Mining

Understanding Data

2

1

2

Page 2: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Operational vs Decision Support Systems

Operational Systems o Support day to day transactions

o Contain current, “up to date” data

o Examples: EMR, customer orders, inventory levels, payroll, bank account balances

Decision Support Systems o Support strategic decision making

o Contain historical, “summarized” data

o Examples: • Clinical support: what treatment is best?

• Population health

• Management support: performance summary, market segmentation

3

Operational Application: EMR

4

3

4

Page 3: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

DSS (Reports)

PricePoint

o for consumers

http://www.txpricepoint.org/

5

Outline: Health Care Data Use

Operational vs. Decision Support Systems

What is Data Science/Business Intelligence

o What is Data Science ?

o What is Big Data?

o Overview of Data Mining

Understanding Data

6

5

6

Page 4: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

What is Data Science?

Other words

o Knowledge Discovery & Data mining (KDD)

o Business Intelligence / Business Analytics

Collecting and refining information from many sources

Analyzing and presenting the information in useful ways

So people can make better business decisions

7

Data ScienceKnowledge Discovery & Data mining (KDD)

KDD: Clean, Merge, ReprocessBig Data : operational data

Human consumable, valid, novel, potentially useful, and ultimately understandable information

8

7

8

Page 5: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

9

Data to Decision

10

9

10

Page 6: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

KDD Process

Operational

Data

• Data cleaning & integration

EDW• Feature Selection (what vars?)

Task Specific Data

• Analysis / Datamining

Results• Validation / Evaluation

Information Presentation

• Action

11

Video

TX Mental Health Landscape (2:46)

o https://www.youtube.com/watch?v=8dPqQt0yXJA

Wealth Inequality (1:30)

o https://www.youtube.com/watch?v=QPKKQnijnsM

Good managers know how to build the data story!

12

11

12

Page 7: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Outline: Health Care Data Use

Operational vs. Decision Support Systems

What is Data Science/Business Intelligence

o What is Data Science?

o What is Big Data?

o Overview of Data Mining

Understanding Data

13

Properties of BIG DATA : 4V

Volume : constantly generating

Velocity : constantly changing

Variety : expressed in many ways

Veracity : lots of errors

(Value)

EXAMPLE: the INTERNET!What do you do to find information/knowledge on the Internet?

14

13

14

Page 8: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

The Big Data Problem – NutshelledMichael Franklin (UC Berkley)

TimeTime

Quality(precision)

Quality(precision)

MoneyMoney

Massive Massive Diverse

and Growing

Data

Something’sgotta give:

15

AMPLab: Integrating Three Key Resources

Algorithms

• Machine Learning, Statistical Methods

• Prediction, Business Intelligence

Machines

• Clusters and Clouds• Warehouse Scale Computing

People

• Crowdsourcing, Human Computation

• Data Scientists, Analysts

16

15

16

Page 9: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Outline: Health Care Data Use

Operational vs. Decision Support Systems

What is Data Science/Business Intelligence

o What is Data Science?

o What is Big Data?

o Overview of Data Mining

Understanding Data

17

What is Data Mining?

Using a combination of artificial intelligence, machine learning, and statistical analysis to analyze data

and discover useful patterns that are “hidden” there

18

17

18

Page 10: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Business uses of data mining: Essentially five tasks

Classification: Group data into predetermined categories o Classify credit applicants as low, medium, high risk

o Classify insurance claims as normal, suspicious

Estimation: Estimate probability of an event through models built from previous data o Estimate the probability of a direct mailing response

o Estimate the potential cohort size for a clinical trial

Prediction: Predict an outcome based on input based on models built from previous data o Predict which customers will leave within six months

o Predict which patient will return to the ED

Affinity Grouping: Group people based on similar characteristics o Find out what books to recommend to Amazon.com users

o Find treatment regime that was successful for similar patient

Description o Help understand large volumes of data by uncovering interesting, useful, and actionable patterns

19

statisticians will be the next sexy job

o Google Chief Economist Hal Varian

shortage of 190,000 data scientists by the year 2019

o McKinsey Global Institute

Job market of data scientists

20

19

20

Page 11: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Outline: Health Care Data Use

Operational vs. Decision Support Systems

What is Data Science/Business Intelligence

o What is Data Science?

o What is Big Data?

o Overview of Data Mining

Understanding Data

21

Applications in Health

A March 2014 poll from MeriTalk and EMC found that 63 percent of healthcare executives in the federal government believe that big data will improve population health management

Exampleso Manage population health

• Accountable Care Organizations (ACO)

o Clinical decision support

o Cohort identification for clinical trials

o Medical fraud detection

22

21

22

Page 12: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Bias and Variancehttp://scott.fortmann-roe.com/docs/BiasVariance.html

precise but not valid?

What is real data like?

Adjust for bias

Take into account variance

23

Numerical Data : distribution

Mean

Standard Deviation

o How dispersed

Range: Max/Min

Median (percentile)

Scatter Plot: 2 vars

24

23

24

Page 13: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

Categorical Data

Tabulation

Cross tabulation

o 2 variables

GIS: maps

25

Take Away IWhat is Data Science? KDD Process

Operational

Data

• Data cleaning & integration

EDW• Feature Selection (what vars?)

Task Specific Data

• Analysis / Datamining

Results• Validation / Evaluation

Information Presentation

• Action

26

25

26

Page 14: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

4 Vs of Big Data

o Volume : lots of data

o Velocity : constantly generating & changing

o Variety : expressed in many ways

o Veracity : lots of errors

o (Value)

Big Data Problems

o Time

o Money

o Quality (Precision)

Three Resources: AMP

o Algorithm

o Machine

o People

Take Away IIWhat is Big Data ?

27

Take Away IIIBusiness uses of data mining: Essentially five tasks

Classification o Classify credit applicants as low, medium, high risk

o Classify insurance claims as normal, suspicious

Estimation o Estimate the probability of a direct mailing response

o Estimate the potential cohort size for a clinical trial

Prediction o Predict which customers will leave within six months

o Predict which patient will return to the ED

Affinity Grouping o Find out what books to recommend to Amazon.com users

o Find treatment regime that was successful for similar patient

Description o Help understand large volumes of data by uncovering interesting, useful, and actionable patterns

28

27

28

Page 15: VWHPV 'DWD 6FLHQFH LQ +HDOWK

PHPM631 (Kum) 1/13/2020

How do you get good with data ?

Sorry, no short cuts. Build experience.

In this course, start you out.

o Tableau / Excel

o SQL

o Assignment 1

o Labs

29

Reminder: due next two weeks

Lab 1: most of you should be done during class

Assignment 1: submit on E-campus day before classo Week one: progress report

o Week two: Final Tutorial

Readings: Chapters 1 & 2

Quiz 1 (E-campus: posted on Tues)o Practice quiz

Group presentation emails

30

29

30