workday: building large scale machine learning pipelines

23
Building Large Scale Machine Learning Pipelines Vlad Giverts Sr Director of Engineering

Upload: datastax-academy

Post on 15-Apr-2017

1.230 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Workday: Building Large Scale Machine Learning Pipelines

Building Large Scale Machine Learning Pipelines

Vlad Giverts

Sr Director of Engineering

Page 2: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Background

Page 3: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Retention Risk

Page 4: Workday: Building Large Scale Machine Learning Pipelines

Architecture

Retention Risk

ML Pipeline

Spark

YARN HDFS

Page 5: Workday: Building Large Scale Machine Learning Pipelines

Architecture

Retention Risk

ML Pipeline

Spark Streaming

Kafka

Cassandra

Spark Streaming

Page 6: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Page 7: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Data and Features

Num Promotions

Pay Range Penetration

Time in Current Job

Manager Attrition Rate Time Between

Promotions

Tenure

Page 8: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Page 9: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Cross Validation

Page 10: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Page 11: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Partition Data

Page 12: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

What are we predicting?

BarryRaise: $1,000

2014 2016

RaviTransferred

JohnLeft :(

JinPromoted!

YuryHired

TejasLeft :(

RogerLeft :(

Page 13: Workday: Building Large Scale Machine Learning Pipelines

So what happens?

Page 14: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Results?

95 / 95

Page 15: Workday: Building Large Scale Machine Learning Pipelines

What REALLY happens…

Page 16: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Results?

15 / 10

Page 17: Workday: Building Large Scale Machine Learning Pipelines

What do we want?

Page 18: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Temporal Validation

BarryRaise: $1,000

2014 2016

TejasTransferred

JohnLeft :(

JinPromoted!

YuryHired

TejasLeft :(

YuryHired

TejasLeft :(

Page 19: Workday: Building Large Scale Machine Learning Pipelines

3 mo 3 mo

Training with Validation

3 mo 3 mo 3 mo 3 mo 3 mo 3 mo

TRAINING VALIDATION

Early 2014 Mid 2015

Page 20: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Partition Data

Page 21: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Partition Data

Page 22: Workday: Building Large Scale Machine Learning Pipelines

Workday Confidential

Results?

2x Precision

3x Recall

30 / 3015 / 10

Page 23: Workday: Building Large Scale Machine Learning Pipelines

TM

Thank You