twitter big data logging jonathan durda and...
TRANSCRIPT
Twitter Big Data Logging
Jonathan Durda and
Shashank Kumar KalakuntlaUnder the guidance of Dr. Sunnie Chung
Cleveland State University, Fall 2014 CIS 612
How to get data from Twitter?
� Neat, structured data from providers such as Gnip
� Problem? Big $$$!
How to get data from Twitter?
� Twitter allows access to real-time tweets through OAuth
� Create app, which provides unique access tokens
How to get data from Twitter?
How to get data from Twitter?
Setting up data stream
� Use Apache Flume to get stream of tweets
� Use consumer key, access token
� Store tweets in JSON format in HDFS
� Issues – config file not pointing to correct location for HDFS,
access token not entered
Setting up data stream
Data in HDFS
Data in HDFS
What to use to analyze data
� Use Hive to analyze our raw data
� Why Hive?
� Readability - familiarity of commands to SQL
� Persistence – Hive tables point to data in HDFS, therefore
tables still live when quitting and restarting
� Maintenance – Hive is very easy to maintain
Run Analysis on Data
Run Analysis on Data
� Now that we have data imported into a table created in Hive,
we can run queries to analyze the data
� How many tweets have I downloaded to work with? Lets find
out!
Run Analysis on data
Run Analysis on Data
Run Analysis on Data
Run Analysis on Data
Run Analysis on Data
Run Analysis on Data
Run Analysis on Data
Run Analysis on Data
Conclusion
�Questions?
�Thank you for listening!