semantic analysis of online health information seeking for cardiovascular diseases, ashutosh jadhav,...

31
Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases 1 Ashutosh Jadhav [email protected] AMIA 2014 Annual Symposium Washington, DC

Upload: knoesis-center-wright-state-university

Post on 04-Jul-2015

139 views

Category:

Data & Analytics


0 download

DESCRIPTION

Our paper presented at AMIA 2014 Annual Symposium. Paper is available at: http://www.knoesis.org/library/resource.php?id=2002 Citation: Ashutosh Jadhav, Amit Sheth, Jyotishman Pathak 'Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal', AMIA Annual Symposium 2014, Washington DC, November 15-19, 2014

TRANSCRIPT

Page 1: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Semantic Analysis of Online Health Information

Seeking for Cardiovascular Diseases

1

Ashutosh Jadhav

[email protected]

AMIA 2014 Annual Symposium

Washington, DC

Page 2: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

• Speaker discloses that he has no

relationships with commercial interests.

Disclosure

Page 3: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Collaborators

Prof. Amit Sheth (PhD Advisor)

Kno.e.sis Center, Wright State

University, OH, USA

Dr. Jyotishman Pathak (Mentor)

Mayo Clinic, Rochester, MN, USA

Page 4: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

http://www.internetlivestats.com/internet-users/

Around 3 Billions (40%) of the world population

Around 300 Million (87 %) of the US population

4

Internet Users in the World

Page 5: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Online Health Information Seeking

5

Page 6: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Online Health Resources

6

Page 7: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Online Health Information Seeking

7

According to the Pew Survey, approximately 8 in 10 online

health inquiries initiate from a search engine.

Fox S, Duggan M. Pew Internet & American Life Project. 2013. Health online 2013

Page 8: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

• According to Center for Disease Control and

Prevention, in the United States

– CVD is one of the most common chronic

diseases

– the leading cause of death (1 in every 4 deaths)

• CVD is common across all socioeconomic

groups and demographics

• Online health resources are “significant

information supplement” for the patients with

chronic conditions

8

Cardiovascular Diseases (CVD)

Use-case

Page 9: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Motivation

• Although cardiovascular diseases (CVD) affect a large

percentage of the population, few studies have

investigated what and how users search for CVD related

information online

• Such knowledge can be applied to improve the online

health search experience as well as to develop more

advanced next-generation knowledge and content

delivery systems

Page 10: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

10

Methods Overview

Page 11: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

• Data:

– CVD related search queries

– Limited to United States

• Data timeframe:

– September 2011 to August 2013

• Data collection tool:

– IBM NetInsight On Demand

(Web Analytics tool)

• Dataset size:

– 10 million CVD related SQ

– Significantly large dataset for a

single class of diseases.11

Dataset Creation

Page 12: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

12

Top CVD Search Queries

Top 1-5 Queries Top 6-10 Queries

heart attack symptom congestive heart failure

blood pressure chart low blood pressure

how to lower blood pressure stroke symptoms

heart rate normal blood pressure

broken heart syndrome high blood pressure symptoms

Page 13: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Health Categories

• Selected “14 consumer oriented” health categories,

representing health information needs

• Methods

– Focus group study (Published in JMIR)

– Online health information seeking literature

– Empirical data analysis

– Health categories on popular health websites

• The health categories and the classification scheme is reviewed and

validated by the Mayo Clinic clinicians and domain experts.

13

Page 14: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Health Categories

Health Categories Health Categories

1 Symptoms 8 Living with

2 Causes 9 Prevention

3 Risks & Complications 10 Side effects

4 Drugs and Medications 11 Medical devices

5 Treatments 12 Diseases and conditions

6 Tests and Diagnosis 13 Age-group References

7 Food and Diet 14 Vital signs

14

Drugs and Medications: tylenol raise blood pressure, ibuprofen heart rate,

dextromethorphan blood pressure, medications pulmonary hypertension,

Page 15: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Health Categories Example

15

Search Query Health Categories

Heart palpitations with headache Symptoms

Tylenol and blood pressure Medication, Vital sign

Pump for pulmonary

hypertension

Medical device,

Disease

Red wine heart disease Food, Disease

Bypass surgery Treatment

Page 16: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Classification: Possible Approaches

• Statistical Machine Learning algorithms

– Require training data

– For multiclass classification problem with 14 classes, we

need lot of training data

– Training data

• expensive to create as it should be created manually by

domain expert

• Coverage will be limited

– Does not consider semantics of queries

16

Page 17: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Domain Constraint

Classifier trained for one disease may

not work for other diseases as the

symptom, treatment, drugs and

medications varies by the diseases

17

Page 18: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Background Knowledge

• UMLS (Unified Medical Language System)

– Comprises over 1 million biomedical concepts and 5

million concept names

– Incorporates variety of medical vocabularies and concepts,

and maps each concept to semantic types

– Contains Consumer Health Vocabulary (CHV)

• Hair loss => Alopecia

– Quarterly updated with new concepts

18

Page 19: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Semantic

Analysis

• UMLS Semantic Type

– Example: symptom or sign, disease or syndrome

• UMLS Concepts

– Example: blood pressure, heart rate

• UMLS MetaMap

– Tool for recognizing UMLS concepts in the text

19

Page 20: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

MetaMap Usage Challenge and Solution

20

Hadoop-MapReduce framework with 16 Nodes

Functional overview of a mapper

Page 21: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Gold Standard Dataset Creation

• Randomly selected 2000 search queries from the analysis

dataset.

• Two domain experts manually annotated 2000 search queries

by labeling one search query with zero, one, or more than

health category

• The annotators first discussed and agreed upon the annotation

scheme.

• To reduce the probability of human errors and subjectivity, the

two annotators discussed together and annotated each query

and created a gold standard dataset with 2000 search queries.

• The gold standard dataset is further divided into training and

testing dataset with 1000 search queries each. 21

Page 22: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

22

Health

Category

Categorization

RuleExample

Drugs and

Medications

• ST:

ORCH|PHSU,

CLND, PHSU

• CC: medication,

medicine,

drugs, dose,

dosage, tablet,

pill

• KW: meds

• without CC:

alcohol,

caffeine, fruit,

prevent

• Tylenol raise

blood pressure

• Medications

pulmonary

hypertension

• ibuprofen heart

rate

• Dextromethorph

an blood

pressure

Page 23: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

23

Intent classes UMLS Semantic Types (ST), UMLS Concepts (CC) and Keywords (KW)

Symptoms ST: SOSY CC: symptoms, signs

Causes CC: cause, reason

Risks & ComplicationsCC: risk, complications

Drugs and MedicationsST: ORCH|PHSU, CLND, PHSU CC: medication, medicine, drugs, dose,

dosage, tablet, pill KW: meds (without CC: alcohol, caffeine, fruit, prevent)

TreatmentsST: TOPP, FTCN (treatment, surgery), CNCE (treatment), CC: remedy,

remediate (without CC: prevention and ‘Drugs and Medication’ queries)

Tests and DiagnosisST: DIAP, LBPR, LBTR CC: Test, diagnosis (without ST: DIAP| TOPP, CC:

alcohol, blood caffeine)

Food and DietST: FOOD CC: caffeine, recipe, meal, menu, diet, eat, breakfast, lunch, dinner,

alcohol, drink

Living withCC: control, manage, reduce, lower, coping, cure, recover KW: living with,

bring down, low down

Prevention CC: prevent, avoidance, low risk

Side effects CC: side effect KW: side effect

Medical devices ST: MEDD

Diseases and conditions ST: DSYN

Age-group References ST: AGGP

Vital signs

CC: blood pressure, heart rate, pulse rate, temperature, Heart beat, blood

glucose (without high/low blood pressure as we considered them under

‘Diseases and Conditions’)

Page 24: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Evaluation: Micro average

Precision Recall

• Classify 1000 search queries from the testing dataset

using the rule-based classifier

• Based on the evaluation, our classification approach has

very good Micro Average

– Precision: 0.8842,

– Recall: 0.8642

– and F-Score: 0.8723

24

Page 25: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Evaluations: Precision and Recall

Analysis for each Health Category

25

Page 26: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

26

Results

No Intent Classes Total QueriesPercentage

Distribution1 Diseases 4,232,398 40.66

2 Vital signs 3,455,809 33.20

3 Symptoms 1,422,826 13.67

4 Living with 1,178,756 11.32

5 Treatments 955,701 9.18

6 Food and Diet 779,949 7.49

7 Med Devices 665,484 6.39

8 Drugs and Medications 603,905 5.80

9 Causes 599,895 5.76

10 Tests & Diagnosis 344,747 3.31

11 Risks and Complication 277,294 2.66

12 Prevention 136,428 1.31

13 Age-group References 87,929 0.84

14 Side effects 25,655 0.25

Total 10,408,921 100

Page 27: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

27

Results

8%

48%

40%

4%

0%

Distribution of search queries by number of intent classes in which they are categorized

0

1

2

3

4 and 5

Page 28: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

28

Data Analysis Results

Page 29: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

29

• Average search query length for CVD is 3.88 words and 22.22 characters

• Around 80% of the CVD search queries have 3 or more words.

• CVD search queries are longer than previously reported non-medical as well

as medical queries

Data Analysis Results

Page 30: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Discussion and Conclusion

• We found that use of MetaMap and UMLS concepts/semantic type

to be a very good approach for customized health categorization

• The top searched health categories for CVD are ‘Diseases and

Conditions’, ‘Vital Sings’, ‘Symptoms’, and ‘Living with’.

• Most of the queries (around 88%) are categorized into either one

or two health categories.

• To the best of our knowledge, there is not much research on

understanding online health information searching for chronic

diseases and especially for CVD.

• This study addresses this knowledge gap and extends our

knowledge about online health information search behavior.

Page 31: Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Thanks!

Ashutosh Jadhav

[email protected]