language of politics on twitter - 02 twitter

Post on 06-Aug-2015

41 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Language of Politics on TwitterSummer School in AI

American University BeirutJune 16, 2015

Yelena Mejova@yelenammSocial Computing GroupQatar Computing Research Institute, HBKU

political

twitter

analysis

Usersindividualsnewsorganizationsbots…

#hashtagsword or phrase preceded by a hash mark (#), used within a message to identify a keyword or topic of interest and facilitate a search for it

linksall links are shortened by Twitter to form t.co/…

shortercontrol for spam, malware, phishingcollect clickthrough information

MEMEan idea, behavior, or style that spreads from person to person within a culture

Richard Dawkins

MEME

Monthly active users302 million (4/28/2015)

Total number of Twitter registered users“about a billion” (9/16/13)

Unique monthly visitors to Twitter.com (desktop)36 million (10/3/13)

Daily active twitter users100 million (10/3/13)

Number of Twitter accounts that have ever sent a tweet

550 million (4/14/14)

TWITTER

TWITTER RESEARCH

Google Trends

userstweets

relationships

Twitter API

https://dev.twitter.com/overview/documentation

users

try it yourself

• go to https://apigee.com/console/twitter • select OAuth1 from Authentication and log in

using your Twitter account

Select api.twitter.com/1.1 from Service

Click on theon the left to see a list of API methods

• select• enter your Twitter handle into screen_name

and click

http://jsonviewer.stack.hu/

http://www.faceplusplus.com/demo-detect/

More info from picture

questions

where are you from?are you male or female?what job do you have?

when did you join?how active are you?

what do you look like?are you a bot?

tweets

#!/usr/bin/env python# -*- coding: utf-8 -*-

from tweepy.streaming import StreamListenerfrom tweepy import OAuthHandlerfrom tweepy import Streamimport sysimport urllib

# Go to http://dev.twitter.com and create an app.# The consumer key and secret will be generated for you afterconsumer_key = '4x8XS232ncHXewIOPa50eZZWz'consumer_secret = '0rjF9c34QgjK6nlL9zSpptAmVntDDsXRKV5JS3sQ0bi15flq5Y'

# After the step above, you will be redirected to your app's page.# Create an access token under the the "Your access token" sectionaccess_token = '2958638362-6VIJ2S7zSX7ellLHvrFLbsJKBKimIDuk62O8ZNP'access_token_secret='EwqIjYNJKDGhJskYHdMS8nX7dBqpxB94qmmarJL058B9I'

class StdOutListener(StreamListener): """ A listener handles tweets are the received from the stream. This is a basic listener that just prints received tweets to stdout. """

def on_data(self, data): print data[:-1] return True

def on_error(self, status): print status

Querying public stream using python(1)https://tinyurl.com/aiss15-gettweets

def auto_restart_stream(auth,listner,l_keywords): while True: try: sapi = Stream(auth, l) sapi.filter(track=l_keywords) except: #print 'Restarting ;)' continue

if __name__ == '__main__': keywords = [u'Cátar',u'Catar',u'Katar',u'Katara',u'Kataras',u'Katari',u'Kataro',u'Qadar',u'Qatar',u'u'कतर',u'ਕਤਰ',u'卡塔尔,'قطر ',u'卡塔爾 ',u'카타르 ',u'קטאר',u'कत�र',u'કતા�ર''కతర్',u'ກາຕາ',u'カタール ',u'Κατάρ',u'Катар',u'Қатар',u' ',u'ատար কা�তা�র',u'ಕತಾ�ರ್ �',u'ഖത്തർ',u'කටා�ර්',u'กาตาร์�',u'קַאטַאר',u'கத்தா�ர்',u'ប្�ទេ�សកាតា',u'ကတနို��င်�င်�'] l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) auto_restart_stream(auth,l,keywords)

Querying public stream using python(2)https://tinyurl.com/aiss15-gettweets

{"created_at":"Wed May 13 11:44:24 +0000 2015","id":598453736839598080,"id_str":"598453736839598080","text":"Don't get star struck often but I like this guy @Mo_Farah you the man boss! Much respect to you! #Doha #qatar http:\/\/t.co\/wf8nc0C527","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":788413,"id_str":"788413","name":"Mohsin Ali","screen_name":"mohsin","location":"Doha, Qatar","url":"http:\/\/mohsinali.com","description":"Digital story telling, infogrpahics, interactives, R&D, Emerging Technologies, Future Trends, Innovation @ajlabs, Global Nomad, Likes Maps. LBA, DHA, BHA, DOH","protected":false,"verified":false,"followers_count":2422,"friends_count":645,"listed_count":69,"favourites_count":889,"statuses_count":10756,"created_at":"Thu Feb 22 11:11:01 +0000 2007","utc_offset":10800,"time_zone":"Riyadh","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/462946198211407873\/xWaKYtpF.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/462946198211407873\/xWaKYtpF.jpeg","profile_background_tile":true,"profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1249217364\/n504379828_3076_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1249217364\/n504379828_3076_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/788413\/1399210132","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":{"type":"Point","coordinates":[25.316197,51.498302]},"coordinates":{"type":"Point","coordinates":[51.498302,25.316197]},"place":{"id":"0181f32937df0de8","url":"https:\/\/api.twitter.com\/1.1\/geo\/id\/0181f32937df0de8.json","place_type":"admin","name":"Doha","full_name":"Doha, Qatar","country_code":"QA","country":"\u062f\u0648\u0644\u0629 \u0642\u0637\u0631","bounding_box":{"type":"Polygon","coordinates":[[[51.4477039,25.2216],[51.4477039,25.4263938],[51.630581,25.4263938],[51.630581,25.2216]]]},"attributes":{}},"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"Doha","indices":[97,102]},{"text":"qatar","indices":[103,109]}],"trends":[],"urls":[],"user_mentions":[{"screen_name":"Mo_Farah","name":"Mo Farah","id":83855918,"id_str":"83855918","indices":[48,57]}],"symbols":[],"media":[{"id":598453717596119040,"id_str":"598453717596119040","indices":[110,132],"media_url":"http:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","url":"http:\/\/t.co\/wf8nc0C527","display_url":"pic.twitter.com\/wf8nc0C527","expanded_url":"http:\/\/twitter.com\/mohsin\/status\/598453736839598080\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":598453717596119040,"id_str":"598453717596119040","indices":[110,132],"media_url":"http:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/CE4ifEPUIAAhCsG.jpg","url":"http:\/\/t.co\/wf8nc0C527","display_url":"pic.twitter.com\/wf8nc0C527","expanded_url":"http:\/\/twitter.com\/mohsin\/status\/598453736839598080\/photo\/1","type":"photo","sizes":{"small":{"w":340,"h":453,"resize":"fit"},"medium":{"w":600,"h":800,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":768,"h":1024,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1431517464252"}

https://tinyurl.com/aiss15-tweetjson

http://jsonviewer.stack.hu/

JSON Tweet Object

JSON Tweet Objecthttp://jsonviewer.stack.hu/

JSON Tweet Objecthttp://jsonviewer.stack.hu/

JSON Tweet Objecthttp://jsonviewer.stack.hu/

import jsonimport codecsfrom geopy import *

fin = open("rawTweets.txt",'r')fout = open("parsedTweets.txt",'w')

line = fin.readline().rstrip()while (line): jdict = json.loads(line)

if jdict['coordinates'] != None or jdict['place'] != None: # Coordinates if jdict['coordinates'] != None: longitude = jdict['coordinates']['coordinates'][0] latitude = jdict['coordinates']['coordinates'][1] fout.write(str(longitude)+'\t’) fout.write(str(latitude)+'\t')

# Tweet id fout.write(str(jdict['id'])+'\t’) # User screen name fout.write(jdict['user']['screen_name'].encode("UTF-8")+'\t’) # Timestamp fout.write(str(jdict['timestamp_ms'])+'\t’) # User's language fout.write(jdict['user']['lang']+'\t’) # Text fout.write(jdict['text'].encode("UTF-8").replace('\n'," ").replace('\r\n',"")) fout.write('\n')

line = fin.readline().rstrip()

fin.close()fout.close()

Extracting individual fields from JSONhttps://tinyurl.com/aiss15-cleanjson

Tab Separated Value (TSV) format

Language Model

http://tweetcloud.icodeforlove.com/

workshop 25twitter 20religion 17interaction 12online 12dyad 9research 9accepted 7…

Activity

http://www.tweetails.com/

Mentions

http://www.tweetails.com/

questions

what are you interested in?how do you eat/sleep/work/hang out?

how happy are you?what political opinions do you have?what outside sources do you link to?

what new emerging topics are you mentioning?how do you behave?

are you a bot?

network

networknodesedges

User Network

User Network

Follower Network

Mention Network

Mention Networkfor hashtags

questions

how influential are you?how influential are your connections?

who influences you?what are people around you like?

do you bring together different communities?how fast will you know about a piece of news?

are you an opinion leader?are you a bot?

resources

https://dev.twitter.com/overview/documentation

https://apigee.com

try it in your favorite language

https://dev.twitter.com/overview/api/twitter-libraries

next

using Twitter data for real-world political speech mining

top related