observational studies in social media
TRANSCRIPT
![Page 1: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/1.jpg)
Observational Studies
Class Data Mining Technology for Business and SocietyProgram M. Sc. Data ScienceUniversity Sapienza University of RomeSemester Spring 2016Lecturer Carlos Castillo http://chato.cl/
Sources:● Multiple papers, see beginning of each section.
![Page 2: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/2.jpg)
Matching is a popular technique
Randomized controlled experiment
1.Response of subjects assigned to treatment compared to response of subjects assigned to control
2.Assignment of subjects to groups is done using a randomization device
3.Treatment is under the control of a researcher
Matching observational study
1.Response of subjects assigned to treatment compared to response of subjects assigned to control
2.Assignment of subjects to control is done matching characteristics and size of treatment group
3.Treatment is not under the control of a researcher
![Page 3: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/3.jpg)
Matching design: hurricanes and online friendships
Phan, Tuan Q., and Edoardo M. Airoldi. "A natural experiment of social network formation and dynamics." Proceedings of the National Academy of Sciences 112.21 (2015): 6595-6600.
![Page 4: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/4.jpg)
Example: US universitiesand Hurricane Ike in 2008
Phan, Tuan Q., and Edoardo M. Airoldi. "A natural experiment of social network formation and dynamics." Proceedings of the National Academy of Sciences 112.21 (2015): 6595-6600.
Treatment n=5Control n=10Study group n=130
![Page 5: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/5.jpg)
Selection of control group
● Facebook posts from 1.5M students in 130 universities
● Matched 5 affected with 10 unaffected:– Similar: size, college
ranking according to USNews, whether these colleges are public or private institutions, tuition fees, and other regional factors
![Page 6: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/6.jpg)
Results(red=treatment, blue=control)
Both undergo densification
Treatment has larger clustering coefficient(more triangles)
![Page 7: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/7.jpg)
Matching design: exercise and stressas reflected on Twitter
Dos Reis, Virgile Landeiro, and Aron Culotta: Using matched samples to estimate the effects of exercise on mental health from Twitter. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
![Page 8: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/8.jpg)
Exercise Mood
![Page 9: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/9.jpg)
Design
1) Detect exercise at time t1
– Post a message containing hashtag #runkeeper, #nikeplus, #runtastic, #endomondo, #mapmyrun, #strava, #cyclemeter, #fitstats, #mapmyfitness, or #runmeter.
2) Measure mood at time t2 > t1
– Automatic classifier of mood, three important classes: hostility (or anger), depression (or dejection), anxiety (or tension)
![Page 10: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/10.jpg)
Mood classifier
● Hostility (or anger)● “shut your freaking yaphole”
● Depression (or dejection)● “such a horrible day”
● Anxiety (or tension)● “nervous for Monday”
![Page 11: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/11.jpg)
Control = Random users(same country and language)
Hostility Dejection Anxiety
-25
-20
-15
-10
-5
0-21.1 -5.4 -7.9
Per
cent
cha
nge
afte
r e x
erci
sing
![Page 12: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/12.jpg)
Problem: missing variables
Exercise Mood
Demographics
![Page 13: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/13.jpg)
Matching method ...
● For each user in treatment, find another user that:– Is a reciprocal friend of the user
– In same city/state
– With same gender
– Closest number of followers, followees, tweets
![Page 14: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/14.jpg)
Control = Matched users (blue)
Hostility Dejection Anxiety
-25
-20
-15
-10
-5
0-21.1 -5.4 -7.9
0.9-3.9 -2.7
Per
cent
cha
nge
afte
r e x
erci
sing
![Page 15: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/15.jpg)
Can you guess a possible explanation?
%Female
%from CA
0 10 20 30 40 50 60
random controlmatched controltreatment
#Followers
#Friends
0 50 100 150 200 250 300 350 400 450 500
random controlmatched controltreatment
![Page 16: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/16.jpg)
Matching design and difference-in-differences: question answering sites
Hüseyin Oktay, Brian J. Taylor, and David D. Jensen. 2010. Causal discovery in social media using quasi-experimental designs. SOMA 2010.
![Page 17: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/17.jpg)
Stack Overflow
![Page 18: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/18.jpg)
Research question
● What happens after an answer is accepted?
● Does this inhibit people from answering?
![Page 19: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/19.jpg)
The lifetime of a question
● Most answers are received shortly after a question is posed
● Over time, fewer answers are received● At some point, an answer might be accepted by
the asker
![Page 20: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/20.jpg)
Measurement
● Rate after
● Rate before
● Answer rate change
![Page 21: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/21.jpg)
Results
● Results indicate that the average answer rate change is negative, i.e. there are less answers after an answer is selected
What is suspicious about this?
![Page 22: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/22.jpg)
Matching design
Treatment Control (matched)
The matched question: (1) has no accepted answer by t+Δt, (2) has similar Nt/t, and (3) has similar Nt+Δt/Δt
![Page 23: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/23.jpg)
Difference-in-differences
● Difference-in-differences:
![Page 24: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/24.jpg)
Results
● The matching design shows that the answering rate change is more positive for treatment questions (those having a selected answer)
● Having a selected answer slows down the reduction in answering rate = more answers!
![Page 25: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/25.jpg)
Propensity score matching: actions and outcomes
Alexandra Olteanu, Onur Varol, and Emre Kıcıman, Towards an Open-Domain Framework for Distilling the Outcomes of Personal Experiences from Social Media Timelines, in International Conference on Web and Social Media (ICWSM), AAAI - Association for the Advancement of Artificial Intelligence, 17 May 2016. [link]
All the slides from this section from author's talks:
![Page 26: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/26.jpg)
Have a question? Ask the Internet!should i go to law school
should i take a multivitamin
should i text her or wait for her to text me
should i join the military
should i leave my husband
should i get married
should i pop a burn blister
should i see a doctor
should i consolidate my student loans
should i do cardio before or after weights
should i get a tattoo
![Page 27: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/27.jpg)
Idea
● Open-domain system to extract ...
Situation → Action → Outcomes
● … from social media● Assume there will be many mistakes● Attempt the best possible design
![Page 28: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/28.jpg)
Example
T1: “I got a kitten! We named
her Versace :-)”
![Page 29: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/29.jpg)
Example
T1: “I got a kitten! We named
her Versace :-)”
T2: “No sleep because the damn kitten is nuts!”
![Page 30: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/30.jpg)
Basic operations
(1) Extract timelines
(2) Match events
(3) Precedents and subsequents
![Page 31: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/31.jpg)
Many sub-problems
● Identification of experiential messages● Timestamping event occurrences● Recognition and canonicalization events● Identification of precedent and subsequent
events● Identification of positive and negative valence
of events
![Page 32: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/32.jpg)
Experiential messages classifierPersonal Experiences Other (news, 3rd person,
etc.)
Just completed a 15.72km run with @RunKeeper. Check it out! <URL> #RunKeeper
New campaign to protect children from second hand smoke launched <URL>
Just to set the mood I brought some Marvin Gaye and Chardonnay
Whoa. The kid from Cincinnati just suffered a horrible injury. Not good.
Lacrosse is so much fun why didn’t I start earlier lol
@Bob I hear you.
Oh yeah guys we got a new puppy
@Charlie did you enjoy your night at the club?
Naïve-Bayes classifier • Features = collocated
tokens• 10k labeled tweets.• Fleiss’ kappa = 0.325
26% of tweets mention personal experiences8% mention goals/desires66% are news/3rd person or other tweets
![Page 33: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/33.jpg)
Event identification
I got a new kitten and he has blue eyes and stripes and I need a good name but
nothing that’s normal
I got a new kitten
he has blue eyes
but nothing that’s normal
stripes
I need a good name
== got a cat, got a new cat, …
Kıcıman, Emre, and Matthew Richardson. "Towards decision support and goal achievement: Identifying action-outcome relationships from social media." KDD 2015. [link]
![Page 34: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/34.jpg)
Alignment
![Page 35: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/35.jpg)
Alignment and matching
![Page 36: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/36.jpg)
Compare withboth neighboring quadrants
![Page 37: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/37.jpg)
Example subsequentsEvent Example PosNeg
Pros cat named We just got a cat and named it Versace
0.70
I’ve got a cat I’ve got a kitten asleep on my lap, and my heart has softened.
0.67
Love my new kitten
I love my new kitten 0.88
Cons Ran upstairs But I ran upstairs and fell and now my head hurts
0.20
Damn kitten … no sleep because the damn kitten kept going nuts…
0.22
Cat is literally My cat is literally the devil 0.31
![Page 38: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/38.jpg)
Example precedents● Event: “personal record” in marathon
Days Before Marathon
![Page 39: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/39.jpg)
Improving matching
● Matching ideally should take many elements into account
● Can we take all the elements we know?– Yes!
● Propensity matching matches by P(T=1)
![Page 40: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/40.jpg)
Propensity matching stratification
![Page 41: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/41.jpg)
Propensity matching stratification
Features of a user are all of their past events
PS Estimator trained w/average perceptron learning algorithm; extracted timelines are training data.
Decile stratification
![Page 42: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/42.jpg)
Propensity score matching
● You got a kitten● According to what's known about your past,
your probability of getting a kitten was x● You will be matched with someone whose
probability of getting a kitten was also x– But who did not get a kitten
● Every strata has a different unbalance– Which is predictable
![Page 43: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/43.jpg)
Matching design
● 39 situations in 9 groups● Outcome is binary
variable● Average effect
P(outcome|T) - P(outcome|C)
![Page 44: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/44.jpg)
Example:having high triglycerides level
Outcome Count Absolute Increase
Z-Score
Your_risk 46 24.8% 18.12
Statin 48 23.1% 17.69
Lower 120 35.9% 17.18
Cardiovascular 54 23.0% 16.72
Healthy_diet 55 19.3% 16.54
Fatty_acid 29 18.3% 16.37
Help_prevent 73 26.9% 16.01
Risk_factor 33 18.3% 15.55
Fish_oil 48 24.4% 15.42
inflammation 78 25.1% 15.30
![Page 45: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/45.jpg)
Example:having belly fat
Outcome Count Absolute Increase Z-Score
Burn 156 62.2% 8.96
Ab_workout 13 8.5% 5.82
Workout_lose 13 8.5% 5.82
Help_burn 8 11.1% 5.82
add_video 26 14.0% 5.75
url_playlist 26 14.0% 5.75
Fitness 39 18.6% 5.51
Ab 43 19.1% 5.51
Playlist_mention 30 15.3% 5.39
Biceps 7 4.7% 4.74
![Page 46: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/46.jpg)
![Page 47: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/47.jpg)
Evaluation
Labeling by non-experts (Mechanical Turk workers)Usual measures: precision and recall
![Page 48: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/48.jpg)
Precision @ Rank
![Page 49: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/49.jpg)
Summary
● No matching– Requires randomization into treatment and control
groups
● Matching– Ideally is done on all known variables
● Propensity score matching– Powerful tool to combine known variables
● Be very skeptical about your results!
![Page 50: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/50.jpg)
EventInstall net to keep cat inside the house
![Page 51: Observational studies in social media](https://reader031.vdocuments.net/reader031/viewer/2022030307/58ea33b11a28ab61358b5315/html5/thumbnails/51.jpg)
OutcomeLearning that cats do whatever they want