screening twitter users for depression and ptsd

12
Screening Twitter Users for Depression and PTSD with Lexical Decision Lists Ted Pedersen University of Minnesota, Duluth [email protected]

Upload: university-of-minnesota-duluth

Post on 03-Aug-2015

36 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Screening Twitter Users for Depression and PTSD

Screening Twitter Users for Depression and PTSD with

Lexical Decision Lists

Ted PedersenUniversity of Minnesota, Duluth

[email protected]

Page 2: Screening Twitter Users for Depression and PTSD

Motivations● Interesting classification task● Even more interesting to identify vocabulary that indicates depression or PTSD● Or tendency to self-report?

● Focused on decision lists, a simple machine learning method that learns a human interpretable model

Page 3: Screening Twitter Users for Depression and PTSD

Decision Lists● All tweets for each user kept on single line (to avoid splitting)

● Text is lowercased, anything not alpha-numeric is removed● Randomly shuffled

● Ngram features learned from first 8 million words in training data for each condition● Ngrams may be binary or any length 1-6● Ngrams made up of stopwords removed (or not)● Ngrams weighted by frequency (or binary)

● Eight different decision lists learned● system2 most accurate : Ngrams 1-6, stopwords, and binary weighting

Page 4: Screening Twitter Users for Depression and PTSD

Decision Lists● Any Ngram that meets previous three conditions and occurs at least 50 times more often in one condition than the other is selected as a feature

● Since conditions are binary (DvC, PvC, DvP) frequency in one condition is positive while the other is negative

● Ngrams that occur about the same number of times in both conditions not especially indicative or interesting

Page 5: Screening Twitter Users for Depression and PTSD

Running Decision List● For each Ngram in tweet, check to see if it is in decision list ● If using frequency weight, add value (positive

or negative) of the Ngram to an overall score● If using binary weight, add 1 or -1 to overall

score

● Do this for all tweets for a user, if overall score > 0 then one class, <= 0 the other

Page 6: Screening Twitter Users for Depression and PTSD

Decision List● Decision lists often make a classification after

finding the most indicative feature ● Elected to use all features found in user tweets

to provide more nuanced decision● System2 decision list has

● 18,617 features (DvC)● 21,145 features (DvP)● 17,936 features (PvC)

Page 7: Screening Twitter Users for Depression and PTSD

Results?

DvP DvC PvC

System2 .769 .736 .720

System1 .760 .731 .721

Random .471 .492 .489

● System2 and System1 are identical except that 2 uses a stoplist while 1 does not● Both use Ngrams 1-6 and binary weighting

Page 8: Screening Twitter Users for Depression and PTSD

Top 10 Features● DvC

● Depression : ud83c, please, love, follow, ufe0f, re, f*cking, love you, im, udf38● Control : http, http t co, http t, co, t co, ud83d, lol, u2764 u2764 -, u2764 u2764

u2764, u2764 u2764 u2764 u2764

● PvC ● PTSD : u2026, co, t co, u043e, u0430, u0435, thank, thank you, please, u0438● Control : ud83d, rt, ude02, ud83d ude02, gt, u2764 -, lol, u201c, ude02 ud83d -,

ud83d ude02 ud83d

● DvP ● Depression : ud83d, ud83c, rt, love, ude02, ud83d ude02, im, follow, don t, don,

love you● PTSD : co, t co, http -, http t, http t co, u2026, amp, news, thanks, answer

Page 9: Screening Twitter Users for Depression and PTSD

Lessons ● Standard machine learning algorithms can

perform well at this task● Even very simple ones like our decision lists

● Emoticons and Emoji are often strong indicators● Ngrams of varying length combined with binary

weights attained best results● Frequency weighting very poor● Stoplist has minimal impact

Page 10: Screening Twitter Users for Depression and PTSD

Discussion● How typical is it to self-report depression or PTSD?

● Is desire to self-report an indicator of something else?● Do untreated / undiagnosed users look differently?

● How common are these conditions?● PTSD : 7-8% (www.ptsd.va.gov)● Depression : 17% (www.adaa.org)

● Typical to have multiple diagnoses● PTSD + Depression● Anxiety + Depression

Page 11: Screening Twitter Users for Depression and PTSD

A case of self-reporting

Which is worse, cancer or depression? The answer is clear. Depression is worse: depression makes

you want to die and cancer doesn’t.

I’ve spent all my adult life with depression lurking. I haven’t mentioned it to very many people at all. For the first ten years I talked about it to nobody at all,

for the next decade only Gill and therapists ...

Page 12: Screening Twitter Users for Depression and PTSD

Adam Kilgarriff

● Posted to blog May 3, 2015. Died May 16 at age 55.● https://blog.kilgarriff.co.uk/?p=101