harvesting data from twitter workshop: hands-on experience

28
Harvesting Data from Twitter: Hands on Experience Dr. Nora alTwairesh, Ms. Tarfa alBuhairi, Ms. Mawaheb alTuwaijri, and Ms. Afnan alMoammar

Upload: asagroup

Post on 14-Apr-2017

269 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Harvesting Data from Twitter Workshop: Hands-on Experience

Harvesting Data from Twitter: Hands on

Experience

Dr. Nora alTwairesh, Ms. Tarfa alBuhairi, Ms. Mawaheb alTuwaijri, and Ms. Afnan alMoammar

Page 2: Harvesting Data from Twitter Workshop: Hands-on Experience

Content

• Introduction about Twitter API• Some ready to use tools (no programming)• Comparison between R and Python• R• Python

Page 3: Harvesting Data from Twitter Workshop: Hands-on Experience

WHY

TWITTER?!

Page 4: Harvesting Data from Twitter Workshop: Hands-on Experience

Why Twitter

• Twitter has become a mass information hub that can be used to study the evolution of any issue matter: revolutionary machine• Research disciplines that study Twitter data spanned

the domains of computer science, information science, communications, business, economics, education, medicine, political science, and sociology.

Page 5: Harvesting Data from Twitter Workshop: Hands-on Experience

• Recent studies show that %60 of daily Arabic tweets are from Saudi Arabia.

Why Twitter

Hamdy Mubarak and Kareem Darwish. 2014. Using Twitter to collect a multi-dialectal corpus of Arabic. ANLP 2014:1.

Page 6: Harvesting Data from Twitter Workshop: Hands-on Experience

Twitter API

• Free access to the tweets posted in the last 7 days within a certain rate-limit. • Any tweets posted earlier than 7 days are considered historical

tweets and should be purchased through third party providers• The Twitter API provides three interfaces for tweet collection:

Streaming API, REST API and Search API

Page 7: Harvesting Data from Twitter Workshop: Hands-on Experience

Streaming API• The Streaming API provides real-time tweets in a live-poll fashion. • In a Streaming API, requested tweets will be constantly flowing as

they are posted on Twitter. It is delivered in three bandwidths: “spritzer” :1%, “gardenhose”: 10% and “firehose”: 100% of all tweets posted on Twitter. • A regular user wanting to collect tweets will be granted spritzer

access.

Page 8: Harvesting Data from Twitter Workshop: Hands-on Experience

REST API• The REST API was specifically designed for programmatic access

to read and write Twitter data. • Third party applications that interact with Twitter are provided with

a large set of methods in the REST API to develop these applications.• The access of the REST API is also rate-limited, the limit is 150

requests per hour.

Page 9: Harvesting Data from Twitter Workshop: Hands-on Experience

Search API• Similar to the REST API, the Search API is pull-based. It replicates

the search functionality provided on the Twitter website. However, tweets retrieved are restricted to the past 7 days.

• the Search API is not appropriate for high-throughput real-time data acquisition. As such Twitter Inc. discourages its use and plans to discontinue it in the future.

Page 10: Harvesting Data from Twitter Workshop: Hands-on Experience

Create a Twitter App• To access the Twitter API you need to create a twitter app: follow this simple tutorial to do so:https://iag.me/socialmedia/how-to-create-a-twitter-app-in-8-easy-steps/• you will use the OAUTH settings in both R and Python:• Consumer Key• Consumer Secret• OAuth Access Token• OAuth Access Token Secret

Page 11: Harvesting Data from Twitter Workshop: Hands-on Experience

Tools to Collect Tweets

• Nodexl: https://nodexl.codeplex.com/ • Tweet Archivist : https://www.tweetarchivist.com/ • Twitter Archiving Google Spreadsheet (TAGS): https

://tags.hawksey.info/

Page 12: Harvesting Data from Twitter Workshop: Hands-on Experience
Page 13: Harvesting Data from Twitter Workshop: Hands-on Experience

What is R?

•Roos & Robert.

16

Page 14: Harvesting Data from Twitter Workshop: Hands-on Experience

Why R?

Statistics

Machine Learning

Data Analysis

Page 15: Harvesting Data from Twitter Workshop: Hands-on Experience

Why R?

Statistics

Machine Learning

Data Analysis Also:

Programming Language

Page 16: Harvesting Data from Twitter Workshop: Hands-on Experience

R allows you to integrate with

Page 17: Harvesting Data from Twitter Workshop: Hands-on Experience

Code

Code

C++

Code

Jave

CodePython

CodeR

Page 18: Harvesting Data from Twitter Workshop: Hands-on Experience

Fastest-growing language

https://www.r-bloggers.com/r-is-the-fastest-growing-language-on-stackoverflow/

Page 19: Harvesting Data from Twitter Workshop: Hands-on Experience

fastest-growing language

Page 20: Harvesting Data from Twitter Workshop: Hands-on Experience

Examples

Page 21: Harvesting Data from Twitter Workshop: Hands-on Experience

Now ..

Open your laptop, please

Page 22: Harvesting Data from Twitter Workshop: Hands-on Experience

Steps to install R1: install R:

• https://cran.r-project.org/bin/windows/base/ ---- http://cran.r-project.org/bin/macosx/

2: install RStudio (after installing R)• https://www.rstudio.com/products/rstudio/download3/

3: Install these packages (see the user manual):• streamR/ ROAuth/ RJSONIO/ RTextTools/ e1071/ SparseM.

User manual: • http://www.devchakraborty.com/RunningRJafroc.pdf

R Packages list:• https://cran.r-project.org/web/packages/available_packages_by_date.html

Developing Packages with RStudio:• https://support.rstudio.com/hc/en-us/articles/200486488?version=0.99.903&mode=de

sktop

• https://cran.r-project.org/doc/manuals/R-exts.html

Page 24: Harvesting Data from Twitter Workshop: Hands-on Experience

Python

• Two versions: 2.7 3.X• Twitter packages: twitter -- -tweepy• IDE :Anaconda: iPython notebook: Jupyter

Page 25: Harvesting Data from Twitter Workshop: Hands-on Experience

Installing Python• Install Anaconda from here• https://www.continuum.io/downloads

choose Python 2.7 version (only for this tutorial)• Install the twitter package: From the command line

(terminal) type: pip install twitter

Page 26: Harvesting Data from Twitter Workshop: Hands-on Experience

Comparison between R and Python

• https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis#gs.GuXGfAc• http://blog.udacity.com/2015/01/python-vs-r-learn-first.html• http://www.dataschool.io/python-or-r-for-data-science/

Page 27: Harvesting Data from Twitter Workshop: Hands-on Experience

Contact Us

ASA Research Group

Twitter: @ASA__IUEmail: [email protected]: http://asa.imamu.edu.sa/

IWAN Research Group

Twitter: @IWAN_RGEmail: [email protected] Website: http://iwan.ksu.edu.sa

Page 28: Harvesting Data from Twitter Workshop: Hands-on Experience

Thank you,

See you later …

THE END ..