insight data engineering project
TRANSCRIPT
RedditRYour Personalized gateway to
Reddit.com
Aravind Kumar RameshInsight Data Engineering Fellow, New York
Motivation
82.54 billion pageviews73.15 million submissions, 725.85 million comments
1
In 2015
What’s trending ? Maximize Content Engagement
Personalized recommendation
Solutions
◉Generating recommendations using ALS
- ALS - Compute Intensive.
- Generating recommendations using user graph
Use Parquet
Original Dataset1084.5 GBCompressed Parquet187.8 GB
Queries ran 3x faster on Parquet.
Solution
Table Design
PRIMARY KEY (author,created_utc))with clustering order by (created_utc asc)
Secondary IndexCREATE INDEX subreddit ON subredditinfo (subreddit);
I am Aravind I am here because I love data engineering and working with large scale data. You can find me @aravindk1992
About Me
Bachelor’s in Telecommunication Engineering Master’s in Computer Science from the State University of New York at Buffalo, New York