automating netflix ml qcon sf 2017 | eugen cepoi, davis ...automating netflix ml pipelines with...

36
QCon SF 2017 | Eugen Cepoi, Davis Shepherd Automating Netflix ML Pipelines With Meson

Upload: others

Post on 13-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

QCon SF 2017 | Eugen Cepoi, Davis Shepherd

Automating Netflix ML Pipelines With Meson

Page 2: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

?

Page 3: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and
Page 4: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Goal

Create a personalized experience to help members find content to watch and enjoy

Page 5: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Recommendation ContextOnline caches

Member streaming data

Page 6: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Recommendation Context

Member streaming data

Training pipeline

Online caches

Models

Page 7: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Recommendation Context

Member streaming data

Training pipeline

Online caches

Models

Page 8: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Recommendation Context

Member streaming data

Training pipeline

Online caches

Models

Cache

Precompute System

Page 9: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Recommendation Context

Member streaming data

Online caches

AB Test Allocation

Training pipeline

Models

Training pipeline

Models

Training pipeline

Models

Cache

Precompute System

Page 10: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Innovation is driven by experimentation

Page 11: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Training Pipelines

Data Preparation Spark/Hive/Kafka Stratified Sampling

Model Selection HyperParameter tuning

Model Publish S3 Online Caches

Feature Generation Label Generation Feature Encoders Model Training

Proprietary Algos Spark/ Tensorflow

Scoring / InferencePre-computeLive-computeSpark and Online Caches

Page 12: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

● A collection of operators● Little to no orchestration● Often limited to single machine

MinionRunner.py \ --config params.minion=HiveToLocal \ --config params.hive_table=prodhive.ae.data_feed \ --config params.hive_partition='region == "US"' \ --config params.hive_fields=show_title,score \ --config params.output_filename=/tmp/us.sims.csv \ --config params.output_delimiter=,...

Before Meson

Page 13: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

● Support Heterogeneous systems

● Highly flexible generic orchestration

● Handle failures

● Provide Reproducibility

● Support Multi-tenancy

● Support External Triggers

Desired Properties

Page 14: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Why didn’t an out of the box solution work?● Spark and Scala support was paramount● Options available didn’t have the flexibility and scalability that we needed

Page 15: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Meson overview

Page 16: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Meson Overview

General purpose workflow orchestration engine

Delegates execution to Mesos

Initially built for Machine Learning pipelines for personalization

Supports complex workflow patterns (branching, loops, foreach)

Page 17: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Concepts

Workflow Directed Graph of steps, global parameters, triggers...

Step Describes a job and its configuration

Page 18: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Defining workflows

Scala DSL

Python DSL

UI

REST API

Page 19: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Parameters

Used to configure steps, job arguments, and step transitions

MVEL expression to derive parameter values at runtime

Predefined macros

Page 20: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Workflow patterns

Branches (OR, XOR, AND)

Loops with XOR

Foreach

Using parameters and MVEL

Page 21: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Data artifacts

Data artifact defined by a name and a set of partitions (parameters)

Cross workflow dependencies

External triggers

Page 22: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Workflow versions

Workflows have immutable versions

Enables:

- Better collaboration- Rollbacks- Reproducibility

Page 23: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Deploying workflows with the Gradle plugin

DSL

Job code

Meson Gradle plugin as a CLI

jar/zipLink and version workflow with job code

Add metadata (SCM)

Validate

Deploy

Run

Page 24: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Jenkins

Automated releases with canary workflows

Git

PR merged

Deploy & Run Canary workflows

Deploy production workflows

2

3

1

Page 25: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Monitor & Debug

Page 26: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

ZookeeperUI

Architecture

Per

sist

ence

Scheduler

Orchestration/core meson

REST API

Page 27: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Scheduling

Mesos Agent

Mesos Master

Meson executor

Mesos Agent

Meson executor

Mesos Framework

Scheduler

Meson as a Mesos Framework

Mesos offers resources and runs the steps

Fenzo (Netflix OSS) makes scheduling decisions

Fenzo

Page 28: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Execution

Mesos Agent

Mesos Master

Meson executor

Mesos Agent

Meson executorDocker

container Service Spark driver

Mesos Agent

Spark Executors

Custom executor code for different runtime systems (spark, bash...)

Publish runtime debug infos (logs, url to monitoring UIs...)

Meson executors survive Meson scheduler failure

Page 29: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Looking Forward

Page 30: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Interact with Meson from the running job

Output parameters to leverage loops and foreach

Expose debugging information through Artifacts

Progress Milestones, Links, Counters, Images, etc.

Closing the loop

Page 31: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

A day in the life of a workflow...

Backfills, work prioritization and parallelism

Avoid re-doing work after fixing a bug and re-deploying a workflow

Explicit (data) lineage

Page 32: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Looking back

Page 33: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Adoption

2+ years in production

10+ managed and self-service deployed clusters

1000+ daily Production and A/B Test ML pipelines

2000+ EC2 instances in Spark/Mesos compute pool

20000+ of steps run per day

Page 34: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

One Abstraction doesn’t fit all

Evidenced by the many names:● Workflow● ProcessFlow● Pipeline● DAG● DataFlow

Over specialization will inevitably weaken other use cases.

Copyright

Page 35: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

With the REST API Meson provides “workflows as a service”.Enables many domain specialized abstractions:

● A/B test orchestration● ML orchestration● ETL pipelines● Notebook Automation● And more..

Meson

ETL DSL ML DSL Automation DSL

One Abstraction doesn’t fit all

REST API

Page 36: Automating Netflix ML QCon SF 2017 | Eugen Cepoi, Davis ...Automating Netflix ML Pipelines With Meson? Goal Create a personalized experience to help members find content to watch and

Questions?