predicting norovirus with twitter

29
Social Media Analytics Review and Innovation Group 30/09/2015 Callum Staff

Upload: david-millson

Post on 16-Feb-2017

112 views

Category:

Social Media


0 download

TRANSCRIPT

Page 1: Predicting Norovirus with Twitter

Social Media Analytics Review and

Innovation Group

30/09/2015 Callum Staff

Page 2: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

Agenda

1. Welcome, Introductions and Purpose

2. Governance Arrangements

3. Predicting Norovirus from Twitter

4. Social Media Research Project Guidance

5. Government Social Research Social Media Research Ethics Guidance

6. Relationships between Consumers and Food Business Operators 

7. Value of Social Media Data in Policy

8. Future Meetings: Frequency and Content

 

Page 3: Predicting Norovirus with Twitter

Predicting Norovirus Rises with Twitter

30/09/2015 Callum Staff

Page 4: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

Contents

• Background

• Method

• Model Applications

• Take Home Points

• Using Social Media Data

• Next Steps

Page 5: Predicting Norovirus with Twitter

BACKGROUND

Page 6: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

BACKGROUND: Social Media As An Analytical Tool

• Does this new data source add value to our current knowledge?

• Public Health England – syndromic surveillance

• FSA Social Media Team – human observed monitoring

• Added value = knowing early there is a rise in cases

• Earlier we know = earlier we can intervene

Page 7: Predicting Norovirus with Twitter

METHODS

Page 8: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Crowd-Sourcing Keywords

Page 9: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Crowd-Sourcing Keywords

Page 10: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: CorrelatingJa

n-11

Mar

-11

May

-11

Jul-1

1

Sep-

11

Nov-

11

Jan-

12

Mar

-12

May

-12

Jul-1

2

Sep-

12

Nov-

12

Jan-

13

Mar

-13

May

-13

0

200

400

600

0

400

800

1200

Lab Reports Sickness Tweets

Lab

Repo

rts

Twee

ts

Page 11: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Correlating – Raw Values or Changes?

• Correlations between raw values – not indicative of whether a rise is going to

occur

• Raw values stronger correlations than changes week to week

• Changes are calculated between fortnights not weeks because week to

week changes are too small

Correlations for #sicknessbug Raw Values 1 Week Changes 2 Week Changes

0.50 0.29 0.43

Page 12: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Lagging the Data

Tweets

Lab Reports

Page 13: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Lagging the Data

Tweets

Lab Reports

Page 14: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: What’s a Significant Change?

• Practically – any rise which is outside the normal noise

• On the model – any change in the top quartile

• Arbitrary

• Could do machine learning to look at what significant change classification

lead to the model being most accurate

Page 15: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: What’s a Significant Change?

Jan-12Fe

b-12

Mar-12

May-12

Jun-12Jul-1

2Se

p-12Oct-

12

Dec-12Jan

-13Fe

b-13Apr-1

3

May-13

Jul-13

Aug-13Se

p-13

Nov-13

Dec-13Jan

-14

Mar-14Apr-1

4Jun-14

Jul-14

Aug-14Oct-

140

100

200

300

400

500

600Lab Reports Actual Sig. Change

Lab

Repo

rts

Page 16: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Logistic Regression Model

Given changes in Tweet volumes between weeks 1 and 3, is the change

in lab reports between weeks 4 and 6 significant?

• Significant Change = 1, Non-Significant Change = 0

• Uses exponential formula with Tweet volumes as parameters to give

probability

• Probability can be assigned to either of the binary categories based on a

predefined threshold (typically 0.5)

Page 17: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Adjusting for Project Requirements

• Receiver Operating Characteristic Curve

• Adjusting the threshold = Adjusting # of true/false positives and true/false

negatives

• Want to increase the number of true positives in order to achieve early

detection

• Willing to sacrifice the model picking up false positives in other places

• Early warning system, not a call to arms

Page 18: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Adjusting for Project Requirements

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

1-Spec

Sens

itivi

ty

Specificity:

Sensitivity:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

1-Spec

Sens

itivi

ty

Page 19: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

METHOD: Adjusting for Project Requirements

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

1-Spec

Sens

itivi

ty

Specificity:

Sensitivity:

Page 20: Predicting Norovirus with Twitter

MODEL APPLICATIONS

Page 21: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

MODEL APPLICATIONS: Training and Testing

• Difficulty in that only had 2 and a half years of data

• Periods within this dataset where there were Twitter drop

outs/discontinuous lab reports

• Test set is current real time Tweeting

• Will review at the end of the Norovirus season (April)

Page 22: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

MODEL APPLICATIONS: Final Predictive Model

Page 23: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

MODEL APPLICATIONS: The Intervention

• Higher risk of project means low resource intensity required

• Needs to be easily deployable – match volatile nature of social media

• Using delivery partners:– NHS Choices – Elderly in hospitals/Care homes– Department for Education – Schools– FSA Comms Team – Food handlers

• Social/online media and contact with advocates in above sectors

Page 24: Predicting Norovirus with Twitter

USING SOCIAL MEDIA DATA

Page 25: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

USING SOCIAL MEDIA DATA: Representativeness

• Tweeting Population versus Affected Population

vs

Page 26: Predicting Norovirus with Twitter

TAKE HOME POINTS &

NEXT STEPS

Page 27: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

TAKE HOME POINTS: Analytical/Comms Trade Off

• Variable correlations versus giving comms time to act

• Model accuracy versus early warning

• Choice of datasets

Page 28: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

NEXT STEPS: Geotagging

Page 29: Predicting Norovirus with Twitter

© 2015 Food Standards Agency

[email protected]