predicting norovirus with twitter
TRANSCRIPT
Social Media Analytics Review and
Innovation Group
30/09/2015 Callum Staff
© 2015 Food Standards Agency
Agenda
1. Welcome, Introductions and Purpose
2. Governance Arrangements
3. Predicting Norovirus from Twitter
4. Social Media Research Project Guidance
5. Government Social Research Social Media Research Ethics Guidance
6. Relationships between Consumers and Food Business Operators
7. Value of Social Media Data in Policy
8. Future Meetings: Frequency and Content
Predicting Norovirus Rises with Twitter
30/09/2015 Callum Staff
© 2015 Food Standards Agency
Contents
• Background
• Method
• Model Applications
• Take Home Points
• Using Social Media Data
• Next Steps
BACKGROUND
© 2015 Food Standards Agency
BACKGROUND: Social Media As An Analytical Tool
• Does this new data source add value to our current knowledge?
• Public Health England – syndromic surveillance
• FSA Social Media Team – human observed monitoring
• Added value = knowing early there is a rise in cases
• Earlier we know = earlier we can intervene
METHODS
© 2015 Food Standards Agency
METHOD: Crowd-Sourcing Keywords
© 2015 Food Standards Agency
METHOD: Crowd-Sourcing Keywords
© 2015 Food Standards Agency
METHOD: CorrelatingJa
n-11
Mar
-11
May
-11
Jul-1
1
Sep-
11
Nov-
11
Jan-
12
Mar
-12
May
-12
Jul-1
2
Sep-
12
Nov-
12
Jan-
13
Mar
-13
May
-13
0
200
400
600
0
400
800
1200
Lab Reports Sickness Tweets
Lab
Repo
rts
Twee
ts
© 2015 Food Standards Agency
METHOD: Correlating – Raw Values or Changes?
• Correlations between raw values – not indicative of whether a rise is going to
occur
• Raw values stronger correlations than changes week to week
• Changes are calculated between fortnights not weeks because week to
week changes are too small
Correlations for #sicknessbug Raw Values 1 Week Changes 2 Week Changes
0.50 0.29 0.43
© 2015 Food Standards Agency
METHOD: Lagging the Data
Tweets
Lab Reports
© 2015 Food Standards Agency
METHOD: Lagging the Data
Tweets
Lab Reports
© 2015 Food Standards Agency
METHOD: What’s a Significant Change?
• Practically – any rise which is outside the normal noise
• On the model – any change in the top quartile
• Arbitrary
• Could do machine learning to look at what significant change classification
lead to the model being most accurate
© 2015 Food Standards Agency
METHOD: What’s a Significant Change?
Jan-12Fe
b-12
Mar-12
May-12
Jun-12Jul-1
2Se
p-12Oct-
12
Dec-12Jan
-13Fe
b-13Apr-1
3
May-13
Jul-13
Aug-13Se
p-13
Nov-13
Dec-13Jan
-14
Mar-14Apr-1
4Jun-14
Jul-14
Aug-14Oct-
140
100
200
300
400
500
600Lab Reports Actual Sig. Change
Lab
Repo
rts
© 2015 Food Standards Agency
METHOD: Logistic Regression Model
Given changes in Tweet volumes between weeks 1 and 3, is the change
in lab reports between weeks 4 and 6 significant?
• Significant Change = 1, Non-Significant Change = 0
• Uses exponential formula with Tweet volumes as parameters to give
probability
• Probability can be assigned to either of the binary categories based on a
predefined threshold (typically 0.5)
© 2015 Food Standards Agency
METHOD: Adjusting for Project Requirements
• Receiver Operating Characteristic Curve
• Adjusting the threshold = Adjusting # of true/false positives and true/false
negatives
• Want to increase the number of true positives in order to achieve early
detection
• Willing to sacrifice the model picking up false positives in other places
• Early warning system, not a call to arms
© 2015 Food Standards Agency
METHOD: Adjusting for Project Requirements
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
1-Spec
Sens
itivi
ty
Specificity:
Sensitivity:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
1-Spec
Sens
itivi
ty
© 2015 Food Standards Agency
METHOD: Adjusting for Project Requirements
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
1-Spec
Sens
itivi
ty
Specificity:
Sensitivity:
MODEL APPLICATIONS
© 2015 Food Standards Agency
MODEL APPLICATIONS: Training and Testing
• Difficulty in that only had 2 and a half years of data
• Periods within this dataset where there were Twitter drop
outs/discontinuous lab reports
• Test set is current real time Tweeting
• Will review at the end of the Norovirus season (April)
© 2015 Food Standards Agency
MODEL APPLICATIONS: Final Predictive Model
© 2015 Food Standards Agency
MODEL APPLICATIONS: The Intervention
• Higher risk of project means low resource intensity required
• Needs to be easily deployable – match volatile nature of social media
• Using delivery partners:– NHS Choices – Elderly in hospitals/Care homes– Department for Education – Schools– FSA Comms Team – Food handlers
• Social/online media and contact with advocates in above sectors
USING SOCIAL MEDIA DATA
© 2015 Food Standards Agency
USING SOCIAL MEDIA DATA: Representativeness
• Tweeting Population versus Affected Population
vs
TAKE HOME POINTS &
NEXT STEPS
© 2015 Food Standards Agency
TAKE HOME POINTS: Analytical/Comms Trade Off
• Variable correlations versus giving comms time to act
• Model accuracy versus early warning
• Choice of datasets
© 2015 Food Standards Agency
NEXT STEPS: Geotagging
© 2015 Food Standards Agency