workday: building large scale machine learning pipelines

Post on 15-Apr-2017

1.230 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building Large Scale Machine Learning Pipelines

Vlad Giverts

Sr Director of Engineering

Workday Confidential

Background

Workday Confidential

Retention Risk

Architecture

Retention Risk

ML Pipeline

Spark

YARN HDFS

Architecture

Retention Risk

ML Pipeline

Spark Streaming

Kafka

Cassandra

Spark Streaming

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Workday Confidential

Data and Features

Num Promotions

Pay Range Penetration

Time in Current Job

Manager Attrition Rate Time Between

Promotions

Tenure

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Workday Confidential

Cross Validation

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Partition Data

Workday Confidential

What are we predicting?

BarryRaise: $1,000

2014 2016

RaviTransferred

JohnLeft :(

JinPromoted!

YuryHired

TejasLeft :(

RogerLeft :(

So what happens?

Workday Confidential

Results?

95 / 95

What REALLY happens…

Workday Confidential

Results?

15 / 10

What do we want?

Workday Confidential

Temporal Validation

BarryRaise: $1,000

2014 2016

TejasTransferred

JohnLeft :(

JinPromoted!

YuryHired

TejasLeft :(

YuryHired

TejasLeft :(

3 mo 3 mo

Training with Validation

3 mo 3 mo 3 mo 3 mo 3 mo 3 mo

TRAINING VALIDATION

Early 2014 Mid 2015

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Partition Data

Workday Confidential

Data Pipeline

Raw Data Feature Engineering

Model Training

Model Validation

Partition Data

Workday Confidential

Results?

2x Precision

3x Recall

30 / 3015 / 10

TM

Thank You

top related