a dream of predicting elections and trading stocks using twitter - yelena mejova, qatar computing...

41
A Dream of Predicting Elections and Trading Stocks using Twitter @yelenamm Yelena Mejova Yet Another Conference Moscow Nov 30 2014

Upload: yandex

Post on 15-Jun-2015

172 views

Category:

Technology


2 download

DESCRIPTION

TRANSCRIPT

Page 1: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

A Dream of Predicting Elections and Trading Stocks using Twitter

@yelenamm Yelena Mejova

Yet Another ConferenceMoscow Nov 30 2014

Page 2: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute
Page 3: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Money and Power

Movie box office salesConsumer confidence

Dow Jones Industrial AverageIndividual stocks

Political leaningPolarization

User classificationPredicting elections!

Financial Indexes Political Opinion

Page 4: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

More…

CIKM 2013 TutorialTWITTER AND THE REAL WORLD

with Ingmar Weber

https://sites.google.com/site/twitterandtherealworld/home

Finance, Politics, Public Health, Event Detection

Page 5: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Can I get rich on the stock market?

Page 6: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Efficient Market Hypothesis:– Financial markets are information efficient: prices

fully reflect all available information– Cannot be predicted

JUST AS WELL

Answer: NO

Page 7: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Behavioral Economics: overconfidence, overreaction, information bias…

• Insider trading, governmental manipulation…

• Speculative bubbles: information be damned!

• Bitcoin: where is the value? – pure bubble

A non-random walk down Wall Street (1999) Lo & MacKinlay

Answer: NO MAYBE?

Page 8: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

http://www.caymanatlantic.com/Self-reported Gains

http://dataminr.com/http://nymag.com/daily/intelligencer/2013/04/bloombergs-vip-terminal-tweeters.html

http://gnip.com/

1. content providers

2. specialized providers 3. data analytics

4. traders

Page 9: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Movies

Hollywood Stock Exchange

Predicting the Future with Social Media @sitaramasur Asur, Huberman @ WI-IAT 2010

• 2.89 million tweets• 24 moviesCorrel (tweet rate & box office gross) = 0.90using previous week’s tweetsto predict weekend box office gross:

Adj R2 = 0.973…and sentiment (positive/negative) score to predict second weekend box office gross:

Adj R2 = 0.94

least squares linear regression using previous week’s HSX scoresto predict weekend box office gross:

Adj R2 = 0.967

Page 10: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Consumer Confidence

• Index of Consumer Sentiment (ICS) (Reuters/UMich)• Economic Confidence Index (ECI) (Gallup)

• Subjectivity Lexicon: Opinion Finder

[some figures from authors’ original slides]

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series @brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)

• High day-to-day volatility.• Average last k days.• Keyword “jobs”

k = 1, 7, 30• @ k=15 correlates with ECI

(Gallup) at r = 0.731

Page 11: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Predicting 1 month in the future using previous 15 days

• Correlation with Gallup poll:– Twitter model: 77.5%– Poll model: 80.4%

• As Twitter grows, so is its accuracy

Consumer ConfidenceFrom Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

@brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)

Page 12: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Twitter mood predicts the stock market@jlbollen Bollen, Mao, Zeng @ Journal of Computational Science (2011)

• Opinion Finder: positive / negative• GPOMS: calm, alert, sure, vital, kind and happy

Twitter 2008 (~10M tweets)

[some figures from authors’ original slides]

DJIA888 citations!

Slight correlation only with Calm GPOMS mood (0.065 at 6 day lag)

Page 13: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Tracking stocks $STOCK

Stocks Tweets and trades: The information content of stock microblogs@timmsprenger Sprenger, Tumasjan, Sandner, Welpe

@ European Financial Management (2013)

Page 14: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Stocks

• Tweets: Jan 1 – Jun 30, 2010• S&P100 companies using $STOCK (price change & volume)• Naïve Bayes classifier trained on 2,500 tweets (buy/sell/hold): 81.2%

accuracy

Tweets and trades: The information content of stock microblogs@timmsprenger Sprenger, Tumasjan, Sandner, Welpe

@ European Financial Management (2013)

DOMINATED BY FEW “EXPERTS”1.5% posted 53.7% of all messages– Their quality is not much better!

BULLISH STOCK RETURNS-0.022 p<0.05

0.091 p<0.001

VOLUME TRADING VOLUME0.073 p<0.001

0.312 p<0.001

Page 15: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Stocks

• Twitter: Jan 1 – Jun 30, 2010• 150 (randomly selected) companies in S&P 500– Daily relative price change– Traded volume normalized by mean traded volume

for that company for entire time period

Correlating financial time series with micro-blogging activityRuiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)

=

[some figures from authors’ original slides]

represent tweets as a GRAPH

constrain graph to a company and a time window

+ similarity nodes connecting very similar tweets (RTs) using Jaccard distance

Page 16: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Trading Simulation

[some figures from authors’ original slides]

Page 17: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• the only one that obtains a profit during which the Dow Jones fell -5.8%

Correlating financial time series with micro-blogging activityRuiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)

• Best performance for vector auto-regression with the number of connected components

proposed

Page 18: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Don’t fire your stock broker yet

http://www.nytimes.com/interactive/2012/10/15/business/Declining-US-High-Frequency-Trading.html?ref=business

High-Speed Trading No Longer Hurtling Forward

Computer Flaws Get Wry Smile From Humans Displaced

http://dealbook.nytimes.com/2013/09/19/computer-flaws-get-wry-smile-from-humans-

displaced/?ref=highfrequencyalgorithmictrading

How a Trading Algorithm Went Awry

http://online.wsj.com/article/SB10001424052748704029304575526390131916792.html

Page 19: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Can we track & predict political

sentiment?

Page 20: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Elections“the crowning of the Internet as the king of all political media”“the beginning of the Internet presidency”

- on Obama's 2008 victory Mitch Wagner, InformationWeek

Transparency“Instantaneous tweeting of shady government practices -- and the resulting uproar -- means that public bodies are more responsive than ever”.

- Wesley Donehue, CNN

Mobilization“This exercise of power has produced a template for political action on a massive scale fueled by social media.”

- on PIPA and SOPA Vivek Wadhwa, Washington Post

blog

geru

nive

rsity

.wor

dpre

ss.c

om

Page 21: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

US politics

• Most research will be presented• Clear left/right distinction• Popular political figures• High(ish) Twitter engagement REPUBLICAN

(right)DEMOCRAT

(left)

Page 22: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Sampling Twitter for political speech– general keywords: #current– event keywords: #debate08, #tweetdebate– people: obama, romney, merkel– parties: democrat, republican, pirate– accounts: wefollow, twellow– news stories, known URL retweets

• Caveats– requires expert knowledge– known best after the event– selection bias (who do you want to ignore?)

lets talk politics

Page 23: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

1. Text (text classification)2. Network (label propagation)

political leaning classification

Page 24: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

political leaning classification

• Bootstrapped hashtag-based sample of political discussion• Gardenhose Sep 14 - Nov 4, 2010• Classes: right, left, ambiguous

TEXT-BASED• remove stopwords, hashtags, mentions, urls, all words occurring once in

the corpus• TFIDF weighting:

HASHTAG-BASED• remove hashtags used by only one user

Predicting the political alignment of twitter users @vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)

Page 25: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

NETWORK-BASED

• Label propagation– Initialize cluster membership

arbitrarily– Iteratively update each node’s label

according to the majority of its neighbors

– Ties are broken randomly• Cluster assignment by majority

cluster label (using manually labeled data)

political leaning classification

retweet network

Predicting the political alignment of twitter users @vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)

Page 26: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Classifier: Support Vector Machine

political leaning classification

network-based method

Predicting the political alignment of twitter users @vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)

Page 27: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

SEED-BASED (highly precise)1. Start with few seed users of known leaning2. The leaning of their followers is determined by which side they

retweet more3. Propagate users’ leaning to their tweets/hashtags/etc

hashtag accuracy: 98.6%, 93%, 90% (by source)

political leaning classificationPolitical hashtag hijacking in the US

Hadgu, Garimella, Weber @ WWW (2013)

Page 28: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Correlates with ADA (Americans for Democratic Action score):

– Spearman rank order correlation: .44

– Pearson product-moment correlation coefficient: .51

Visualizing media bias through Twitter@JisunAn An, Cha, Gummadi, Crowcroft, Quercia @ AAAI (2012)

Jaccard similarity of their audience (co-subscribers)distance between

two media

• Position news sources in leaning by considering the overlap in common audience (followers on Twitter)

political leaning classification

Page 29: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Nov 21, 2013 – Feb 26, 2014• Classifier labeled to identify pro- and

anti- protest sentiment• Twitter, blogs, news, forums, Facebook

political leaning classificationRussia, Ukraine, and the West: Social Media Sentiment in

the Euromaidan Protests@bretling Etling @ Berkman Center Research (2014)

Ukr

aine

Russ

iaU

S &

UK

Does it reflect the overall sentiment of the people?

Page 30: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

look who’s talking

• 2010 US Senate special election in Massachusetts

• Silent majority & vocal minority tweet differently (different agendas?)

• Spamming, fake grassroots movements

Vocal Minority versus Silent Majority: Discovering the Opinions of the Long Tail @enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011)

number of tweets per user

Page 31: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Truthiness is a quality characterizing a "truth" that a person making an argument or assertion claims to know intuitively "from the gut" or because it "feels right" without regard to evidence, logic, intellectual examination, or facts.

look who’s talkingDetecting and Tracking Political Abuse in Social Media

Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)

Classifying memes for astroturf

Truthy project by Indiana University

Page 32: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

look who’s talking

#ampat @PeaceKaren_25 &@HopeMarie_25

gopleader.gov Chris Coons

#Truthy @senjohnmccain on.cnn.com/aVMu5y “Obama said…”

TRU

THY

LEG

ITIM

ATE

Detecting and Tracking Political Abuse in Social Media Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)

Page 33: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• 2009 German federal elections

electionsPredicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

Tumasjan, Sprenger, Sandner, Welpe @ AAAI (2010)

sentiment profiles of leading candidates in tweets mentioning them (using LIWC2007) “The mere number of tweets reflects

voter preferences and comes close to traditional election polls”

CONTROVERSY!

638 citations!

Page 34: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

electionsWhy the Pirate Party won the German election of 2009 or the trouble with predictions: A

response to Tumasjan, Sprenger, Sander, & Welpe, "Predicting elections with twitter: What 140 characters reveal about political sentiment"

@ajungherr Jungherr, Jürgens, Schoen @ SSCR V30/N2 (2012)

“show that the results of TSSW are contingent on arbitrary choices of the authors”

If results of polls played a role in deciding upon the inclusion of particular parties, the TSSW method is dependent

on public opinion surveys

Choice of Parties Choice of Dates

prediction analysis […] between [13.9] and [27.9], the day of the election,

produces a MAE of of 2.13, significantly higher than the MAE for TSSW

Page 35: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• Non-US elections:

– Irish: On using twitter to monitor political sentiment and predict election results, Bermingham, Smeaton (2011)• "Our approach however has demonstrated an error which is not competitive with the traditional

polling methods.”

– Dutch: Predicting the 2011 Dutch senate election results with twitter, Sang, Bos (2012)• Uses polls for demographic imbalances, yet performance still below traditional polls

– Singapore: Tweets and votes: A study of the 2011 singapore general election, Skoric, Poor, Achananuparp, Lim, Jiang (2012)• Not as accurate as traditional polls, performance at local government levels

– New Zealand: Can Social Media Predict Election Results? Evidence from New Zealand, Michael P. Cameron (2013)• “the size of the effect is small and it appears that social media presence will therefore only make a

difference in closely contested elections”

– many more coming out each day!

elections

Check out Gayo-Avello’s

literature surveys!

Page 36: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Metaxas et al. @ SocialCom (2011)

Page 37: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

• A method of prediction should be an algorithm finalized before the election– specify data collection, cleaning, analysis, interpretation…

• Data from social media are fundamentally different than data from natural phenomena– people change their behavior next time around– spammers & activists will try to take advantage

• From a testable theory on why and when it predicts (avoid self-deception!)

• (maybe) Learn from professional pollsters– tweet ≠ user– user ≠ eligible voter– eligible voter ≠ voter

How (Not) To Predict Elections @takis_metaxas Metaxas et al. @ SocialCom (2011)

elections

[from authors’ original slides]

Page 38: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

What now?

Now-casting Fore-castingShow improvement over baseline

or that you could make money / a difference

Publish a paper: let us know!(or go to Wall Street / Political Thinktank )

Page 39: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

thank you

Yelena Mejova@yelenamm

[email protected]

Page 40: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute
Page 41: A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

1. Bullishness is affected more strongly by returns than vice versa2. Message volume predicts trading volume3. … but high trading volume and volatility predict message volume more4. Agreement among traders leads to lower trading volumes

day of the week market index

Fixed-effects panel regressions at 1 and 2 day lags