the evolution of the big data platform @ netflix (oscon 2015)

53
The Evolution of Big Data Platform @ Netflix Eva Tse July 22, 2015

Upload: eva-tse

Post on 17-Aug-2015

613 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: The evolution of the big data platform @ Netflix (OSCON 2015)

The Evolution of Big Data Platform@

Netflix

Eva TseJuly 22, 2015

Page 2: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 3: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 4: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 5: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 6: The evolution of the big data platform @ Netflix (OSCON 2015)

Our biggest challenge is scale

Page 7: The evolution of the big data platform @ Netflix (OSCON 2015)

Netflix Key Business Metrics

65+ millionmembers

50 countries 1000+ devices

supported

10 billionhours / quarter

Page 8: The evolution of the big data platform @ Netflix (OSCON 2015)

Global Expansion200 countries by end of 2016

Page 9: The evolution of the big data platform @ Netflix (OSCON 2015)

Big Data SizeTotal ~20 PB DW on S3 Read ~10% DW dailyWrite ~10% of read data daily

~ 500 billion events daily

~ 350 active users

Page 10: The evolution of the big data platform @ Netflix (OSCON 2015)

Our traditional BI stack is our competition

Page 11: The evolution of the big data platform @ Netflix (OSCON 2015)

How do we meet the functionality bar and yet make it scale?

How do we make big data bite-size again?

Page 12: The evolution of the big data platform @ Netflix (OSCON 2015)

Our North Star

• Infrastructure– No undifferentiated heavy lifting

• Architecture– Scalable and sustainable

• Self-serve– Ecosystem of tools

Page 13: The evolution of the big data platform @ Netflix (OSCON 2015)

Cloudapps

Suro/Kafka Ursula

CassandraAegisthus

Dimension Data

Event Data

15 min

Daily

AWS S3

SS Tables

Data Pipelines

Page 14: The evolution of the big data platform @ Netflix (OSCON 2015)

Parquet FF

Metacat(Federated metadata service)

Pig workflow visualization

Data movement

Data visualization

(Hadoop clusters)

Job/Cluster perfvisualization

Data lineage

Data quality

Storage Compute Service Tools

(Federated execution service)

AWS S3

Page 15: The evolution of the big data platform @ Netflix (OSCON 2015)

Analytics

ETL

Interactive data exploration

Interactive slice & dice

RT analytics & iterative/ML algo

Evolving Big Data Processing Needs

Page 16: The evolution of the big data platform @ Netflix (OSCON 2015)

Metacat(Federated metadata service)

Pig workflow visualization

Data movement

Data visualization

Job/Cluster perfvisualization

Data lineage

Data quality

Service Tools

(Federated execution service)

Big Data Portal

API Portal

Big Data APIEvolving Services/Tools Ecosystem

Page 17: The evolution of the big data platform @ Netflix (OSCON 2015)

AWS S3 as our DW Storage• S3 as single source of truth (not HDFS)• 11 9’s durability and 4 9’s availability• Separate compute and storage• Key enablement to

– multiple clusters– easy upgrade via r/b deployment

Page 18: The evolution of the big data platform @ Netflix (OSCON 2015)

Evolution of Big Data Processing Systems

Page 19: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 20: The evolution of the big data platform @ Netflix (OSCON 2015)

• Analytics• Hive-QL is close to ANSI SQL syntax• Hive metastore serves as single source

of truth for metadata for big data

Page 21: The evolution of the big data platform @ Netflix (OSCON 2015)

• ETL• Better language construct for ETL • Contributions since 0.11• Customization

– Integration with Metacat to Hive Metastore

– Integration with S3

Page 22: The evolution of the big data platform @ Netflix (OSCON 2015)

• Interactive data exploration and experimentation• Why we like presto?

– Integration with Hive metastore– Easy integration with S3– Works at petabyte scale– ANSI SQL for usability– Fast

Page 23: The evolution of the big data platform @ Netflix (OSCON 2015)

• Our contributions– S3 file system– Query optimizations– Complex types support – Parquet file format integration– Working on predicate pushdown

Page 24: The evolution of the big data platform @ Netflix (OSCON 2015)

Parquet

• Columnar file format• Supported across Hive, Pig, Presto, Spark• Performance benefits across different processing engines• Working on vectorized read, lazy load and lazy

materialization

Page 25: The evolution of the big data platform @ Netflix (OSCON 2015)

• Interactive dashboard for slicing and dicing• Column-based in-memory data store for time series data• Serves a specific use case very well

Page 26: The evolution of the big data platform @ Netflix (OSCON 2015)

• ETL, RT analytics, ML algorithms• Why we like Spark?

– Cohesive environment – batch and ‘stream’ processing– Multiple language support – Scala, Python– Performance benefits– Run on top of YARN for multi-tenancy– Community momentum

Page 27: The evolution of the big data platform @ Netflix (OSCON 2015)

Metacat(Federated metadata service)

Pig workflow visualization

Data movement

Data visualization

Job/Cluster perfvisualization

Data lineage

Data quality

Service Tools

(Federated execution service)

Big Data Portal

API Portal

Big Data APIEvolution of Services/Tools

Ecosystem

Page 28: The evolution of the big data platform @ Netflix (OSCON 2015)

• Federated execution engine• Expose [your fave big data engine] as a

service • Flexible data model to support future job

types• Cluster configuration management

Page 29: The evolution of the big data platform @ Netflix (OSCON 2015)

Metacat• Federated metadata catalog for the whole data platform

– Proxy service to different metadata sources

• Data metrics, data usage, ownership, categorization and retention policy …

• Common interface for tools to interact with metadata

• To be open sourced in 2015 on Netflix OSS

Page 30: The evolution of the big data platform @ Netflix (OSCON 2015)

Metacat(Federated metadata service)

Pig workflow visualization

Data movement

Data visualization

Job/Cluster perfvisualization

Data lineage

Data quality

Service Tools

(Federated execution service)

Big Data Portal

API Portal

Big Data API dd

Page 31: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 32: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 33: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 34: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 35: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 36: The evolution of the big data platform @ Netflix (OSCON 2015)

Big Data API• Integration layer for our ecosystem of tools and services• Python library (called Kragle)• Building block for our ETL workflow• Building block for Big Data Portal

Page 37: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 38: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 39: The evolution of the big data platform @ Netflix (OSCON 2015)

Big Data Portal• One stop shop for all big data related tools and services• Built on top of Big Data API

Page 40: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 41: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 42: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 43: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 44: The evolution of the big data platform @ Netflix (OSCON 2015)

Open source is an integral part of our strategy to achieve scale

Page 45: The evolution of the big data platform @ Netflix (OSCON 2015)

Big Data Processing Systems

Services/Tools Ecosystem

Page 46: The evolution of the big data platform @ Netflix (OSCON 2015)

Why use Open Source?• Collaborate with other internet scale tech companies• Unchartered area/scale, lock-in is not desirable• Need the flexibility to achieve scalabilityBUT…• Lots of choices• White box approach

Page 47: The evolution of the big data platform @ Netflix (OSCON 2015)

Why contribute back?

• Non IP or trade secret • Help shape direction of projects • Don’t want to fork and diverge• Attract top talent

Page 48: The evolution of the big data platform @ Netflix (OSCON 2015)

Why contribute our own tool?

• Share our goodness• Set industry standard• Community can help evolve the tool

Page 49: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 50: The evolution of the big data platform @ Netflix (OSCON 2015)

Is open source right for you?

Page 51: The evolution of the big data platform @ Netflix (OSCON 2015)
Page 52: The evolution of the big data platform @ Netflix (OSCON 2015)

Measuring big data - understanding data by usage

By Charles Smith, NetflixTomorrow @ 1:40-2:20pm