yun yuan week4.0_demo

6
Reddit Story What will happen as the comments of the post keeps going Yun Yuan

Upload: yun-yuan

Post on 09-Jan-2017

133 views

Category:

Career


0 download

TRANSCRIPT

Page 1: Yun yuan week4.0_demo

Reddit Story

What will happen as the comments of the post keeps going

Yun Yuan

Page 2: Yun yuan week4.0_demo

Motivation for Project

• Interested in Social News: All topics under the sun

• From tree structure of comments to timeline structure of comments

• See how opinions evolve as time flows

Page 3: Yun yuan week4.0_demo

Input and Output

Data Input• Reddit Comments from S3 Data Dump (JSON files)• Reddit Posts Info from Reddit API (JSON files)

Data Output• For each post, organize comments in timestamp- base

with some significant attributes, and show hottest comments for that post

• Web App Presentation (Link): Graph and Short Texts• Demo: Under Construction

Page 4: Yun yuan week4.0_demo

Tentative Pipeline and Data Flows

+

CommentsJSON

PostsJSON

Post -> Trends-> Hot Comments Challenges Encountered:

• Null Value of field from JSON when doing ingestion

• Comment trends vary

Page 5: Yun yuan week4.0_demo

Distributed Clusters

1 Cluster: 4 Nodes of m4.largeHadoop/HDFSKafka/ZookeeperCassandra

1 Node of t2.microFlask

1 Cluster: 4 Nodes of m4.largeSpark

~$400 per mon

Page 6: Yun yuan week4.0_demo

About me: Yun Yuan