netflix recommendations using spark + cassandra (prasanna padmanabhan & roopa tangirala,...
TRANSCRIPT
![Page 1: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/1.jpg)
Netflix Recommendations using Spark + Cassandra
Prasanna PadmanabhanRoopa Tangirala
![Page 2: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/2.jpg)
Turn on Netflix and the absolute best content for you would automatically start playing
![Page 3: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/3.jpg)
Netflix Recommendations
![Page 4: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/4.jpg)
Netflix Recommendations
![Page 5: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/5.jpg)
Ranking
Everything is a RecommendationRo
ws
Over 80% of what members watch comes from our recommendations
Recommendations are driven by Machine Learning Algorithms
![Page 6: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/6.jpg)
Data Driven
Offline Experiment using Historical
Data
Online A/B Testing
Rollout Feature to ALL members
Success Success
Fail
Algorithmic Page Generation
Trending Now
![Page 7: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/7.jpg)
Offline Experimentation
![Page 8: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/8.jpg)
Algorithmic Page Generation
Personalizing the ordering of rows on the homepage
![Page 9: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/9.jpg)
Algorithmic Page Generation
Without Algorithmic Page Generation With Algorithmic Page Generation
Diversity of the Page
Affinity for specific rows
Drawbacks
![Page 10: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/10.jpg)
Algorithmic Page Generation
Production
![Page 11: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/11.jpg)
Algorithmic Page Generation
Production Variant 1
![Page 12: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/12.jpg)
Algorithmic Page Generation
Production Variant 1 Variant 2
Row DistributionTV/Movie Ratio
![Page 13: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/13.jpg)
Algorithmic Page Generation
Production Variant 1 Variant 2
Evaluate best variant based on the plays
Actual Plays:
![Page 14: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/14.jpg)
Algorithmic Page Generation
Production Variant 1 Variant 2
Evaluate best variant based on the plays
Actual Plays:
![Page 15: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/15.jpg)
Algorithmic Page Generation
Production Variant 1 Variant 2
Evaluate best variant based on the plays
Actual Plays:
![Page 16: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/16.jpg)
Variant 2
Algorithmic Page Generation
Production Variant 1
Evaluate best variant based on the plays
Actual Plays:
![Page 17: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/17.jpg)
Offline Experiment ArchitectureMemberSelection
Runs once a day
Ratings Service
S3
Snapshot Snapshot Store
Snapshot Forklift
Viewing History Service
MyList Service
Data Snapshots
Evaluate Metrics
Generate Pages
… …
A/B Test
![Page 18: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/18.jpg)
Data Model - Requirements
• Need for historical service data
• Optimize for Batch Writes and Point Reads
![Page 19: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/19.jpg)
Data Model
20161009_1001
20161009_1002
DATE_MEMBER_ID
MyList
BLOB
MyList
BLOB
ROWS
COLUMN
COLUMN FAMILY: MYLIST
![Page 20: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/20.jpg)
Data Model
20161009_1001
20161009_1002
DATE_MEMBER_ID
ViewingData
BLOB
ViewingData
BLOB
ROWS
COLUMN
COLUMN FAMILY: VIEWING-HISTORY
![Page 21: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/21.jpg)
Data Model
20161009_1001_0
20161009_1001_1
DATE_MEMBERID_IDX
ViewingData
BLOB
ViewingData
BLOB
ROWS
COLUMN
20161009_1001_2ViewingData
BLOB
COLUMN FAMILY: VIEWING-HISTORY
![Page 22: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/22.jpg)
Online A/B Testing
![Page 23: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/23.jpg)
Trending Now
Videos that are Trending and Personalized for you
![Page 24: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/24.jpg)
Trending Now
It’s 7 PM on a Monday
![Page 25: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/25.jpg)
Trending Now
It’s 10 PM on a Saturday
![Page 26: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/26.jpg)
Trending Now
Pokeman
![Page 27: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/27.jpg)
Fast Feedback LoopUI
Data Systems
Streaming Apps
Rec Systems
![Page 28: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/28.jpg)
Trending Now - Data InfrastructureImpression
Service
Viewing History Service
UI
Online Services
Trends Store
Compute Trends
Model Training
Captures videos shown in view port
Captures videos played by members
Publish Models
Viewing History Service
Ratings. .. .
![Page 29: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/29.jpg)
State Management in Cassandra
Video Number of Plays
Stranger Things 100
Narcos 200
Orange is the new Black 300
![Page 30: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/30.jpg)
State Management in Cassandra
Trends Store
State Present
?Compute Trends
Yes
NoInit State from
Cassandra
Load State
Update State
Read Events
![Page 31: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/31.jpg)
Data Model - Requirements
• Trending data is for a specific interval of time
• Optimize for Batch Writes and Batch Reads
![Page 32: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/32.jpg)
Data Model
101_METADATA
102_METADATA
VIDEOID_METADATA
Plays
BLOB
Plays
BLOB
ROWS
COLUMNS
103_METADATAPlays
BLOB
COLUMN FAMILY: Interval 1,Interval 2…Interval N
Impressions
BLOB
Impressions
BLOB
Impressions
BLOB
![Page 33: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/33.jpg)
Roopa TangiralaEngineering Manager @ NetflixTwitter - @roopatangirala
![Page 34: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/34.jpg)
FORKLIFTER
![Page 35: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/35.jpg)
ARCHITECTURE
SOURCE TARGET
![Page 36: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/36.jpg)
USE CASES
![Page 37: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/37.jpg)
![Page 38: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/38.jpg)
APACHE THRIFT CQL
![Page 39: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/39.jpg)
![Page 40: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/40.jpg)
DEMO
![Page 41: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/41.jpg)
![Page 42: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/42.jpg)
WHY NOT DSE SPARK?
![Page 43: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/43.jpg)
![Page 44: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/44.jpg)
SCALABILITY
![Page 45: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/45.jpg)
COST EFFECTIVENESS
![Page 46: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/46.jpg)
LESSONS LEARNT
![Page 47: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/47.jpg)
TTL HANDLING• TTL Reading And Writing is Asymmetric -
CASSANDRA 12216 • Thrift Column TTL vs CQL Row TTL
![Page 48: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/48.jpg)
1
6
5
4
3
2
PARTITION DIFFERENCES
1000
00
600000
500000
4000
00
300000
200000100k
75k
50k
25k
425k450k475k
400k
325k
350k375k
300k
275k
250k
225k
200k175k150k125k
500k
525k
550k575k
600k
![Page 49: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/49.jpg)
TUNING• spark.cassandra.connection.keep_aliv
e_ms• spark.cassandra.connection.timeout_
ms• spark.driver.maxResultSize
![Page 50: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/50.jpg)
OOM EXCEPTIONS Spark.executor.memory
spark.cassandra.input.split.size_in_mb
![Page 51: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/51.jpg)
WRITES SPEED SPARK• cassandra.output.batch.size.bytes• cassandra.output.batch.size.rows• cassandra.output.concurrent.writes• cassandra.output.throughput_mb_per_s
ec
![Page 52: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/52.jpg)
Write Timeoutscassandra.output.throughput_mb_per_sec
![Page 53: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/53.jpg)
![Page 54: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.net/reader034/viewer/2022050614/586f75c71a28ab10258b6181/html5/thumbnails/54.jpg)
QUESTIONS?