nicolas kruchten @ datacratic

RTB Optimizer:

Behind the scenes with

a Predictive API

Nicolas Kruchten

PAPIs.io – November 18, 2014

REAL TIME MACHINE LEARNING DECISIONS AS A SERVICE

About Datacratic

• Software company specializing in

high performance systems and

machine learning

• 30 employees, founded in 2009,

based in Montréal, Québec, Canada with an office in New York

• 3 Predictive APIs in market today

• Building a Machine Learning Database to help others

build Predictive APIs and Apps

Real-Time Bidding for online advertising

Real-TimeExchange

Bidder

Bidder

Bidder

Bidder

WebBrowser

GET ad

bid requests


Real-TimeExchange

Bidder

Bidder

Bidder

Bidder

WebBrowser

ad

bids

auction


Real-TimeExchange

Bidder

Bidder

Bidder

Bidder

WebBrowser

This happens millions of times per second

Bidders must respond within 100 milliseconds

ad

bids

auction


Real-TimeExchange

Bidder

Bidder

Bidder

Bidder

WebBrowser

RTB Optimizer enables bidders to achieve campaign goals

ad

bids

auction

Campaign goals

• Advertising campaigns are typically outcome-oriented

– Clicks

– Video views

– Conversions: app installs, purchases, sign-ups

• e.g. Ad network has sold someone 1,000 outcomes for $1,000

• e.g. Advertiser has $1,000 to get as many outcomes as

possible

• Essentially maximize profit or minimize cost-per-outcome

Datacratic’s RTB Optimizer

• Client bidder relays bid-requests to API, API tells it how to bid

• Handles 100,000 queries per second, for 100s of campaign

• API says which campaign should bid and how much

• API also needs outcomes in real-time and campaign goals

RTB Optimizer

Bids APIOutcomes

API

A Predictive API that learns

• Datacratic has no proprietary data set

• API can learn from scratch from the bid-request stream

what works for each campaign:

– Contextual features: website, time of day, banner size and placement

– User features: geo-location, browser, language, # of impressions shown

– Customer-provided data: about the user, about the website

• Provides insights into what features are driving performance

• Can re-use learnings from previous campaigns

Second price auctions

• First Price Auctions

– You bid $1, I bid $2: I win, and I pay $2

• RTB uses Second Price Auctions

– You bid $1, I bid $2: I win, and I pay $1

• Optimal bid = E[ value ]

– Say it’s worth $2 to me

– I will never bid more than $2

– If I bid $1.50 and you bid $1.75: I’ve lost an opportunity for $0.25 surplus!

– I should always bid $2

Don’t buy lottery tickets!

E[ value ] = payout * P( getting the payout )

What’s it to you?

• If client gets paid $10,000 for 1,000 then payout = $10

E[ value | bid-request ] = $10 * P( conversion | bid-request )

• What was an economics problem is now a prediction

problem

• We need to calibrate to predict true probabilities

RTB Optimizer

Bids API

E[ value ]

Outcomes API

P( outcome )

Collecting the data

• To compute P( X | Y ) we need examples of Y’s with an X label

• RTB Optimizer uses mix of strategies to meet campaign goals

• Probe strategy bids randomly to collect data

• Optimized strategy bids with E[ value]

• Automatic training/retraining when API see enough examples

RTB Optimizer

Probe

Bids API

E[ value ]

Training

Outcomes API

P( outcome )

Bias control

• Never stop the probe strategy

• Always need control group for evaluation, retraining

• Risk of filter bubbles: future models trained on previous output

• Bid requests are randomly routed to probe, less often over time

• Models automatically back-tested before deployment

How to learn in real-time

• Classify using bagged generalized linear models

• Generate non-linear features with statistics tables

• Periodically retrain classifier

• Continuously update stats tables

Statistics Table by example

Table Bucket Impressions OutcomesOutcomes/Impressions

95% Confidence Lower Bound on

Outcomes/Impressions

Browser

Chrome 5M 3k 0.060% 0.058%

Firefox 3M 1k 0.033% 0.031%

Website

abc.com 4M 2k 0.050% 0.048%

xyz.com 1k 10 1.000% 0.481%

RTB Optimizer

Probe

Bids API

E[ value ]

Training

Outcomes API

GLZ Classifier

Stats Tables

Real-Time

Batch

Implementation details (are everything)

• 100k requests per second, 10 millisecond latency, running

24/7,

1 trillion predictions to date

• Distributed system, written in C++ 11

• AWS: data in S3, training runs on Amazon EC2 spot market

• http://opensource.datacratic.com/

– RTBkit

– JML

– StarCluster

Does it work?

Classification success? ROC or calibration curves…

Does it work?

Classification success? ROC and calibration curves…

Optimization success? 80% reductions in cost-per-outcome…

Does it work?

Classification success? ROC or calibration curves…

Optimization success? 80% reductions in cost-per-outcome…

Customer success! 25% monthly growth

Thanks!

[email protected]

REAL TIME MACHINE LEARNING DECISIONS AS A SERVICE

nicolas kruchten @ datacratic

Data & Analytics

optimal bid

e value bidrequest

bidrequest stream

p conversion bidrequest

s of campaign api

appsrealtime bidding

proprietary data set

time of day