bdx 2016 - kevin lyons & yakir buskilla @ exelate

Post on 15-Apr-2017

273 Views

Category:

Internet

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Online LearningThe Future of Audience Segmentation is Here

Kevin Lyons + Yakir Buskilla

Models that build profitable marketing audiences at scale...

Finding more of your best customers: High-income business professional

The Modeling Process, simplified

2012 2015

30 - 40 modelslevering billions of events

Creating 100 million + scores

over 1000 models‘leveraging’ trillions of events

Creating 150 billion+ scores / day

The Challenge

In other words, we simply need ….

A system creates as many models as we want, when

we want them, that dynamically adapts in real-time

to changing conditions

○ Automatically creates, validates, ships, and

monitors models, with a capacity that scales

to 10s of thousands of models

The Opportunity

What we really need:

Online models evolve & adapt over time, in

reaction to a changing environment with each

and every event

Given a complete data set, a batch

model is created in entirety all at once

Introducing Online Learning

Batch Online Learning

Creation Evolution

large-scale data storage

large-scaledata schelping

painful data aggregation

lots of manual everything

Harder to build models, but easier to evaluate

limited data storage, mostly for monitoring

event-leveldata streams

light data aggregation

lots of automatic everything

Easier to build, but harder to evaluate (& support)

Batch Models (Offline) vs. Online Learning

Online LearningBatch Models (Offline)

● Outperformed both L2 and Elastic Net

● Leverages small (‘micro’) batches

● Validates and monitors models in real time

● Alerts team when models are not behaving

Some Techno Mumbo Jumbo

Stochastic gradient descent with L1 regularization

eXelate.com @eXelate

Technical Solutions

How do we do it?

eXpresso Serving Cluster

10B events/day

260 nodes across4 data centers

eXtream Modeling Cluster

160B models/day

85 nodes across4 data centers

JGroups

DistributedMessaging

Serving Layer

Online LearningBatch Models (Offline)

Batch

Predefined ratio

Predefined feature selection

One time Validation

Streaming

Downsampling

Automated feature selection

Ongoing data cleaning

Ongoing validation

The Online Learning Challenge

● All necessary data already exists in eXtream

● The cluster’s processing resources can be better utilized

● eXtream addresses most performance / scalability requirements

● Scoring mechanism already exists

eXtream as a Framework for Online Learning

Why it works...

Online Learning Flow

● Labeling Mechanism - customer defined target audience

Events Classification

● Downsampling mechanism● Burst tolerance● Duplicate entries

Dataset Preparation

● Blacklist● Whitelist● Automatic Tuning

Features Selection

● Sliding window of recent events● 60/40 not-converted/converted ratio● Various accuracy metrics (lift, precision, recall, confusion matrix)● Decide if the model is ready for making predictions

Model Validation

● Two phases (Scoring, Re-code)● Scale vs Accuracy tradeoff

Predictions Mechanism

Scalability / Performance

Thousands of

Concurrent Models: High Throughput:

billions of training events per daytraining, validation, scoring

Why do we need it?● Store the models in one common place

● Persistency

● Built-in replication

● Aerospike has built in limitation for object size - 1MB

○ Developed sharding mechanism for storing models on Aerospike

Scalability / Performance

Why do we need it?

Large object issue on Aerospike

The solution is Aerospike fast built-in replication

Cross Data Center Learning

● Low Volume Models

● Traffic Redirection

Monitoring- Why do we need it?

thousands of models

automatically created by users

some models won’t converge

Monitoring- Real Time

Monitoring- Aggregation

Monitoring- DS Bot

eXelate.com @eXelate

Case study

Working in action

● The ideal candidate for digital media expands and even subtly shifts in real time● Real-time modeling tracks and reacts to these changes as they happen, with 2x CPA

improvement over a batch model

The Times, They Are A-Changin’

Market: Downgrading a country’s credit ratings

● Holiday shopping is very different from the rest of the year, particularly Cyber Monday ● AM changes in Eastern US are applied to the Pacific coast before the madness begins

Audiences: Cyber Monday frenzies

● … after the campaign starts, effecting the ideal audience● No need to panic; modeled audience automagically adjust

Product: A product offering is revised

Scores of self-maintaining models that constantly adapt to our ever changing conditions

Happiness Renewed...

top related