data science at netflix with amazon emr (bdt306) | aws re:invent 2013
DESCRIPTION
A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.TRANSCRIPT
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data Science at Netflix with Amazon EMR
Kurt Brown, Director of Data Platform, Netflix
November 13, 2013
Data Platform
S3
Suro
Data Platform
Aegisthus
S3
Suro
Aegisthus
Data Platform
Sting
S3
Suro
Aegisthus
Sting
Data Platform
S3
Suro
Aegisthus
Sting
Data Platform
S3
Suro
Aegisthus
Sting
Data Platform
S3
S3
99.999999999%
S3
S3
High SLA
Query
HDFS ?
Eventual Consistency
S3mper
“Data as a Service” • Execution Service • Event Service • Metadata Service
High SLA Cluster Job
High SLA S3 Query Cluster Job
Query
High SLA S3 Query Cluster Job
Query
High SLA Cluster Job
High SLA S3 Query Cluster Job
Query
High SLA Cluster Job
Bonus
S3 Query Cluster Job
Bonus Cluster Job
High SLA
Query
High SLA Cluster Job
S3 Query Cluster Job
Bonus Cluster Job
High SLA
Query
Tez
Suro
Aegisthus
Questions?
http://jobs.netflix.com [email protected]
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
BDT306