data science at netflix with amazon emr (bdt306) | aws re:invent 2013

53
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Data Science at Netflix with Amazon EMR Kurt Brown, Director of Data Platform, Netflix November 13, 2013

Upload: amazon-web-services

Post on 30-Oct-2014

1.252 views

Category:

Technology


4 download

Tags:

DESCRIPTION

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

TRANSCRIPT

Page 1: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data Science at Netflix with Amazon EMR

Kurt Brown, Director of Data Platform, Netflix

November 13, 2013

Page 2: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 3: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Data Platform

Page 4: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 5: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 6: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Suro

Data Platform

Page 7: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 8: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Aegisthus

Page 9: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 10: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Suro

Aegisthus

Data Platform

Page 11: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Sting

Page 12: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 13: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 14: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Suro

Aegisthus

Sting

Data Platform

Page 15: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 16: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 17: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Suro

Aegisthus

Sting

Data Platform

Page 18: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Suro

Aegisthus

Sting

Data Platform

Page 19: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Page 20: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Page 21: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

99.999999999%

Page 22: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 23: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

Page 24: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3

High SLA

Query

Page 25: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 26: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

HDFS ?

Page 27: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Eventual Consistency

Page 28: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

S3mper

Page 29: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

“Data as a Service” • Execution Service • Event Service • Metadata Service

Page 30: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 31: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 32: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 33: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 34: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 35: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 36: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 37: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 38: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

High SLA Cluster Job

High SLA S3 Query Cluster Job

Query

Page 39: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

High SLA S3 Query Cluster Job

Query

Page 40: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

High SLA Cluster Job

High SLA S3 Query Cluster Job

Query

Page 41: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

High SLA Cluster Job

Bonus

S3 Query Cluster Job

Bonus Cluster Job

High SLA

Query

Page 42: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

High SLA Cluster Job

S3 Query Cluster Job

Bonus Cluster Job

High SLA

Query

Page 43: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 44: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 45: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 46: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 47: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Tez

Page 48: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Suro

Page 49: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Aegisthus

Page 50: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 51: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Page 52: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Questions?

http://jobs.netflix.com [email protected]

Page 53: Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

BDT306