yun yuan week4.0_demo
TRANSCRIPT
Reddit Story
What will happen as the comments of the post keeps going
Yun Yuan
Motivation for Project
• Interested in Social News: All topics under the sun
• From tree structure of comments to timeline structure of comments
• See how opinions evolve as time flows
Input and Output
Data Input• Reddit Comments from S3 Data Dump (JSON files)• Reddit Posts Info from Reddit API (JSON files)
Data Output• For each post, organize comments in timestamp- base
with some significant attributes, and show hottest comments for that post
• Web App Presentation (Link): Graph and Short Texts• Demo: Under Construction
Tentative Pipeline and Data Flows
+
CommentsJSON
PostsJSON
Post -> Trends-> Hot Comments Challenges Encountered:
• Null Value of field from JSON when doing ingestion
• Comment trends vary
Distributed Clusters
1 Cluster: 4 Nodes of m4.largeHadoop/HDFSKafka/ZookeeperCassandra
1 Node of t2.microFlask
1 Cluster: 4 Nodes of m4.largeSpark
~$400 per mon
About me: Yun Yuan