smartnews technight vol.5 : ad data engineering in practice: smartnews...

Post on 07-Jan-2017

15.375 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Engineering In Practice: SmartNews Ads裏のDMP System

Lan

Who am I• Lan

• Veteran hacker but new in AD world

• someone who can make a computer do what he wants—whether the computer wants to or not. (http://paulgraham.com/gba.html)

• ex-{Rakuten, GREE}

• Distribution System, Info Retrieval, ML

Today’s Talk

• DMP in SmartNews Ads

• #1. Prediction

• #2. Targeting

• Future Work & Summary

DMP = Data Management Platform

DMP in SmartNews Ads• Private DMP ( 90%+1st-party data )

• Data Collect, Clean, Aggregation

• ID Mapping

• User Profiling

• User Clustering

• CTR / CVR Prediction

• Lookalike

• Custom Audience

DMPClusters

AD delivery cluster

AD Log in S3

Kinesis

AD tracker

Video AD delivery cluster

DMPstreaming

Audience Data

in DynamoDBRDB

Hadoop

ML

Analytics

Models&

Targeting

SmartNewsLog

ML

Small company but not small data

•Article Meta > 200K/day •Article x {read, share, read_related …} •Channel x {subscribe, preview, view, …} •Push, Live, Weather, Setting, … •Survey result

•Audience Data > 14M (~5M MAU)

•AD Meta •AD History •AD Conversions •AD Optout

• Managed/Compressed Data > 130TB

• Lookalike seeds

• ~1TB Data for training CTR prediction model •> 1M unique features

•User Demographics •Device •Locations •…

#1 Prediction

Pick up an ADto feed here

Similar to Recommendation

but DIFFERENT

• optimization goal • accuracy of the probability

More than Ranking • When we do AD auction

• eCPM (effective Cost per Mille) = CTR (Click Through Rate) x CPC (Cost per Click)

• Suppose we have

• CTRad1=0.05 > CTRad2=0.04 > CTRad3=0.03

• CPCad1 = 10JPY, CPCad2 = 13JPY, CPCad3 = 20JPY(winner)

• but if: pCTRad1 = 0.2 (winner) > pCTR’ad2 = 0.1 > pCTR’ad3 = 0.03

• then we lost 0.1JPY potential income

The CTR(CVR) prediction Problem

μ(a, u, c) = p(click | a,u,c)

CTR Prediction v1• Train and scoring daily

• One GBDT (Gradient Boosting Decision Tree) model per AD campaign

• using ~1month’s data

• Hundreds of small batches inside Hadoop Yarn

• Quick and Simple

• dev in 1 month

• pick up best features for every campaign

• minutes ~ 1 hour for model training

• explainable Tree models

• no need for AD feature

• Same approach for CVR prediction (CPC / CVR = CPA (Cost Per Acquisition) )

delivery result

UserFeatures

generatesamples

Yarn

Users

predictions

sample

model

scoring

sample

model

scoring

sample

model

scoring

Metrics• NE (Normalized Cross- Entropy)

• the average log loss when using predicted CTR / the average log loss per impression

• https://facebook.com//download/321355358042503/adkdd_2014_camera_ready_junfeng.pdf

• AUC (Area under the ROC curve, AUROC)

• measure ranking quality

• others: Precision/Recall, ECS(Effective catalog size), CTR / CVR / Sales, etc

Review of CTR Prediction v1• Marked improvement, moderate AUC & NE

• And

• hard to do overall tuning

• hard to prediction online (feature set differs)

• latency for new campaigns

• relatively poor performance to new campaigns (cold start)

• lost the connections between campaigns even for the same advertiser

• …

CTR Prediction v2• A simple model for all

• AD feature added

• Dynamic features extraction

• All calculation distributed

• GBDT + LogisticRegression

• Train once per day, scoring twice

About the Features• >1M unique features, sparse

• GBDT provides great feature engineering

• (sometimes) feature engineering is kind of intuition and trial-and-error

• demographic, device, location, reading interests…

• AD history is helpful

• Feature Hashing, Binarization & Discretization, …

Performance improvement

#2 Targeting

Watabe

TamTam

Komiya

Takei

Ikeishi

Nagase

Lan

Niku

Game

Beer

Snack

Costume

Gourmet

Princess

It’s difficult comparing to

Profiling User by Statistics and ML

• Gender Prediction (precision: 0.90+), Age Prediction, …

• News Channel / Source Preference

• AD Slot Preference

• …

Standard Targeting

• Female in Kansai who subscribes Travel Channel

Lookalike Targeting

Lookalike Targeting• Our solution

• Solve it as an classification problem

• Seed user as Positive Sample

• While all targeting candidates as Negative Sample (w/ random sampling )

• based on Spark MLlib Logistic Regression

• 30%~50% CVR↑ comparing to normal targeting

Article Keyword TargetingKeyword

Realtime Calculating Reach UU

Only user who exceeds a certain

read-time threshold will be included

Custom Audience

SmartNewsAD

tracker

Send any custom event(S2S req, web beacon, etc)

EventAudience

BloomFilter Obj

Updatingper

Several Minutes

YourService / App / Site

SmartNewsAD

DeliveryCluster

AD targeting/

Delete Targeting

Lookalike

Lookalike Targeting

Future Work

Targeting Audience by Interests

Collect Negative Signal to

Optimize UX

Summary of My 1st SmartNews Year

• Challenge place. We’re startup so we can move quick and break things

• Learn from the industry leaders. Keep trial-and-error.

• Number don’t lie. Don’t trust your intuition over number.

• But if you really doubt the number, look closely. there may be BUG hidden.

top related