twitter big data logging jonathan durda and...

21
Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the guidance of Dr. Sunnie Chung Cleveland State University, Fall 2014 CIS 612

Upload: others

Post on 28-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Twitter Big Data Logging

Jonathan Durda and

Shashank Kumar KalakuntlaUnder the guidance of Dr. Sunnie Chung

Cleveland State University, Fall 2014 CIS 612

Page 2: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

How to get data from Twitter?

� Neat, structured data from providers such as Gnip

� Problem? Big $$$!

Page 3: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

How to get data from Twitter?

� Twitter allows access to real-time tweets through OAuth

� Create app, which provides unique access tokens

Page 4: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

How to get data from Twitter?

Page 5: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

How to get data from Twitter?

Page 6: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Setting up data stream

� Use Apache Flume to get stream of tweets

� Use consumer key, access token

� Store tweets in JSON format in HDFS

� Issues – config file not pointing to correct location for HDFS,

access token not entered

Page 7: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Setting up data stream

Page 8: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Data in HDFS

Page 9: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Data in HDFS

Page 10: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

What to use to analyze data

� Use Hive to analyze our raw data

� Why Hive?

� Readability - familiarity of commands to SQL

� Persistence – Hive tables point to data in HDFS, therefore

tables still live when quitting and restarting

� Maintenance – Hive is very easy to maintain

Page 11: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 12: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

� Now that we have data imported into a table created in Hive,

we can run queries to analyze the data

� How many tweets have I downloaded to work with? Lets find

out!

Page 13: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on data

Page 14: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 15: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 16: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 17: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 18: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 19: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 20: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Run Analysis on Data

Page 21: Twitter Big Data Logging Jonathan Durda and …cis.csuohio.edu/~sschung/cis612/CIS612Presentation...Twitter Big Data Logging Jonathan Durda and Shashank Kumar Kalakuntla Under the

Conclusion

�Questions?

�Thank you for listening!