ブラックボックスなアドテクを機械学習で推理してみた short ver

ブラックボックスなアドテクを

機械学習で推理してみた

某WEB広告代理店

坂井　尚行

今日、得られるモノ

業務（アドテク）で使われている機械学習がどんなも

のか、ざっくり知ることができる

どんなところで機械学習が使えそうか、ざっくり知る

ことができる

あとで独学できるように参考文献を知ることができる

DSP/RTBの時代広告在庫発生

広告主

媒体

DSP SSP

オークション通知

購入購入

広告在庫発生オークション通知

購入しない

DSP間で広告効果をめぐって競争激化

ところで効果って？(´・ω・`)

理想：安く広告を配信して、売上をあげたい

成果系KPI

売上: ECなどで商品が購入された金額

CV(Conversion): Web上で定義した成果(ex. 購入した回数、資料請求)

効率系KPI

CTR(Click Through Ratio): クリック数/広告表示回数

CPA(Cost Per Acquisition): 広告のコスト/CV数

ROAS(Return Of Ad Spend): 売上/コスト

以下、 DSPと機械学習の私見

DSPの差別化ポイント①

独自の媒体枠：

接続先SSPは皆がつないでいて差がない

GDN、YDN(優先的につないでいるcriteo)、MicroAdが独自枠をお

さえている？

ADNW/SSPに価格マージン/優良ユーザを抜かれないので優位

最近はソーシャルが狙われている

DSPの差別化ポイント②

効果を追求したアルゴリズム：

広告主サイトから離脱後、ほぼリアルタイムで配信

機械学習によるスジのよいユーザの選択

オークションの参加選択と価格の調整

DSPの差別化ポイント③多様なバナーメニュー：

様々な枠にだせるようにサイズは多くする

レコメンド広告ならば、文言・画像レイアウトも複数用意する

アニメーションもつけて工夫する（効果にあまり影響ない、と

いうウワサも…）

機械学習による商品レコメンド

機械学習的に見たDSP分類：

効果の良さそうなユーザをカテゴリに分ける

カテゴリに基づく価格調整：

あるユーザは価格をおさえて、別のユーザは価格を高くする

協調フィルタリング：

ユーザ別にCVしそうな順に商品を推薦する

分類って？(´・ω・`)

複数のカテゴリにわけること

顔の分類

アニメキャラの顔を分類

猫種類の分類

広告のクリック、非クリックへの分類

ステップは二つ

テストデータをもとにモデルを作成

モデルから本番データを類推して分類

[('Abyssinian', 0.621), ('Bengal', 0.144), ('Sphynx', 0.087)]

※※http://rest-term.com/archives/3172/

※ http://christina.hatenablog.com/entry/2015/01/23/212541

http://rest-term.com/archives/3172/

http://christina.hatenablog.com/entry/2015/01/23/212541

ユーザ分類の原則

似たようなユーザは似たような行動をする

効果✖️行動（ログデータ）でユーザを分類する

クリック率の高そうなユーザ

CPAが低そうなユーザ

ROIがよさそうなユーザ

単純な多変量の線形回帰では上手くいかない

仮に以下の数字を置いてみる

CTR: 0.5%

CVR: 1.0%

1,000,000 回広告が配信された場合、

Click数: 5,000回

CV数: 50回

データのうち、ほぼ誤差みたいなデータが重要。うまく場合分けして計算する必要がある。x1

x2

イメージ

効果が高いシンプルなCTR予測多くの{0,1}の変数でクリック有無{-1,1}を

ロジスティック関数で分類

+1:clickする

確率

P (y = +1|x) = 1

1 + exp(�w

Tx)

媒体

広告主

ユーザ

時間

• 媒体NW • 媒体 • URL• 広告主NW • 広告主 • 広告

• (可能なら)年齢/性別 • 主サイト行動履歴

• 曜日 • 時間帯

ex. msn に来訪した場合

(0, 0, 1, 0, 0…)www.msn.com www.yahoo.co.jp

サンプリングと高次補正

媒体NW

媒体

広告枠広告

広告主

広告主 NW

証券会社とyahoo financeではCTRが上がる

変数の組み合せを{0,1}の変数にする

右上のデータは少ないため、

サンプルデータを多めにとる

あとで数式に合うように補正する

P (y = +1|x) = 1

1 + exp(�w

Tx)

【余談】階層化と自己成長サイクル

①優良媒体を増やす

②変数とデータ量が増える

③広告主の効率がよくなる

⑤入札価格が高くなる

⑥他の広告主が増える

④成果（CV・売上）が増える

⑦媒体/SSPの売り上げが増える

CTR予測コンテスト by Criteo & Kaggle

予測モデリング/分析のプラットフォーム

賞金をかけてクラウドソーシングされる

無数の戦略が可能であり、事前にどれが

よいか予測することが困難なため

Criteo CTR Prediction Contest

圧倒的な効果を出してきたCriteoもコン

テストを開催

コンテストの成果を機能開発に反映？

CTR予測の最近のアルゴリズム

主要素分析

入力データから重要なデータを選ぶ

デシジョンツリー作成

重要なデータで３～７階層のツリー

をつくる

ツリーの末端は特徴量になる

ロジスティック回帰

特徴量の線形結合する

ロジスティック関数にあてはめる

classifiers and diverse online learning algorithms. In the con-text of linear classification we go on to evaluate the impactof feature transforms and data freshness. Inspired by thepractical lessons learned, particularly around data freshnessand online learning, we present a model architecture that in-corporates an online learning layer, whilst producing fairlycompact models. Section 4 describes a key component re-quired for the online learning layer, the online joiner, anexperimental piece of infrastructure that can generate a livestream of real-time training data.

Lastly we present ways to trade accuracy for memory andcompute time and to cope with massive amounts of trainingdata. In Section 5 we describe practical ways to keep mem-ory and latency contained for massive scale applications andin Section 6 we delve into the tradeo↵ between training datavolume and accuracy.

2. EXPERIMENTAL SETUPIn order to achieve rigorous and controlled experiments, weprepared o✏ine training data by selecting an arbitrary weekof the 4th quarter of 2013. In order to maintain the sametraining and testing data under di↵erent conditions, we pre-pared o✏ine training data which is similar to that observedonline. We partition the stored o✏ine data into training andtesting and use them to simulate the streaming data for on-line training and prediction. The same training/testing dataare used as testbed for all the experiments in the paper.

Evaluation metrics: Since we are most concerned withthe impact of the factors to the machine learning model,we use the accuracy of prediction instead of metrics directlyrelated to profit and revenue. In this work, we use Normal-ized Entropy (NE) and calibration as our major evaluationmetric.

Normalized Entropy or more accurately, Normalized Cross-Entropy is equivalent to the average log loss per impressiondivided by what the average log loss per impression wouldbe if a model predicted the background click through rate(CTR) for every impression. In other words, it is the pre-dictive log loss normalized by the entropy of the backgroundCTR. The background CTR is the average empirical CTRof the training data set. It would be perhaps more descrip-tive to refer to the metric as the Normalized LogarithmicLoss. The lower the value is, the better is the predictionmade by the model. The reason for this normalization isthat the closer the background CTR is to either 0 or 1, theeasier it is to achieve a better log loss. Dividing by the en-tropy of the background CTR makes the NE insensitive tothe background CTR. Assume a given training data set hasN examples with labels yi 2 {�1,+1} and estimated prob-ability of click pi where i = 1, 2, ...N . The average empiricalCTR as p

NE =� 1

N

Pni=1 (

1+yi2 log(pi) +

1�yi2 log(1� pi))

�(p ⇤ log(p) + (1� p) ⇤ log(1� p))(1)

NE is essentially a component in calculating Relative Infor-mation Gain (RIG) and RIG = 1�NE

Figure 1: Hybrid model structure. Input featuresare transformed by means of boosted decision trees.The output of each individual tree is treated as acategorical input feature to a sparse linear classifier.Boosted decision trees prove to be very powerfulfeature transforms.

Calibration is the ratio of the average estimated CTR andempirical CTR. In other words, it is the ratio of the numberof expected clicks to the number of actually observed clicks.Calibration is a very important metric since accurate andwell-calibrated prediction of CTR is essential to the successof online bidding and auction. The less the calibration di↵ersfrom 1, the better the model is. We only report calibrationin the experiments where it is non-trivial.

Note that, Area-Under-ROC (AUC) is also a pretty goodmetric for measuring ranking quality without consideringcalibration. In a realistic environment, we expect the pre-diction to be accurate instead of merely getting the opti-mal ranking order to avoid potential under-delivery or over-delivery. NE measures the goodness of predictions and im-plicitly reflects calibration. For example, if a model over-predicts by 2x and we apply a global multiplier 0.5 to fixthe calibration, the corresponding NE will be also improvedeven though AUC remains the same. See [12] for in-depthstudy on these metrics.

3. PREDICTION MODEL STRUCTUREIn this section we present a hybrid model structure: theconcatenation of boosted decision trees and of a probabilis-tic sparse linear classifier, illustrated in Figure 1. In Sec-tion 3.1 we show that decision trees are very powerful inputfeature transformations, that significantly increase the ac-curacy of probabilistic linear classifiers. In Section 3.2 weshow how fresher training data leads to more accurate pre-dictions. This motivates the idea to use an online learningmethod to train the linear classifier. In Section 3.3 we com-pare a number of online learning variants for two families ofprobabilistic linear classifiers.

The online learning schemes we evaluate are based on the

※ http://quinonero.net/Publications/predicting-clicks-facebook.pdfより抜粋

http://quinonero.net/Publications/predicting-clicks-facebook.pdf%E3%82%88%E3%82%8A%E6%8A%9C%E7%B2%8B

この考え方、死亡フラグです

ヒャッハー！機械学習で

一山あてるぜー！！

コールドスタート問題

これまでのアルゴリズムは大量のデータが必要

最初はデータがない

既存の中規模・大規模サイトにサービスを提供

するか、グロースハックが重要

運用

地道なバックテストとA/Bテストの繰り返し

分類器のアルゴリズムの検証

使用する変数の検証

最初から検証することを込みでアーキテクチャを

作りこんでおく

今後の展望基本となるアルゴリズム？

分類、クラスタリング、レコメンド

画像処理、音声処理、自然言語処理を使う分野で使われる？

CRMのデータエントリー（申込書のデータ入力）

リアル店舗の動線解析

Skypeのリアルタイム翻訳

アルゴリズム提供のプラットフォーム化

クラウドソーシング　Kaggle

Microsoft Azure ML

競争優位性は、迅速にアルゴリズムをビジネスへ転換することで得られる？

Appendix　基本資料DSP/SSP/RTB

The Ad Technology: http://www.amazon.co.jp/dp/4798136557

DSP/RTBオーディエンスターゲティング入門: http://

www.amazon.co.jp//dp/4864780013

機械学習の入門書

集合知プログラミング：http://www.amazon.co.jp/dp/4873113644/

Mahout In Action: http://www.amazon.co.jp/dp/4873115841/

オンライン機械学習： http://www.amazon.co.jp/dp/406152903X

オンライン機械学習は論文を読み始める前に読んでおくとよいです

http://www.amazon.co.jp//dp/4864780013

http://www.amazon.co.jp/dp/4873113644/

http://www.amazon.co.jp/dp/4873115841/

http://www.amazon.co.jp/dp/406152903X

Appendix　CTR予測基本的な考え方：

https://web.stanford.edu/class/msande239/lectures-2011/Lecture%2007%20Targeting%202011.pdf

最近のCTR予測の元ネタ論文：

http://quinonero.net/Publications/predicting-clicks-facebook.pdf

Kaggle Criteo Challenge:

https://www.kaggle.com/c/criteo-display-ad-challenge

https://www.kaggle.com/c/criteo-display-ad-challenge/forums/t/10555/3-idiots-solution-libffm

https://github.com/guestwalk/kaggle-2014-criteo

ロジスティック回帰によるシンプルな予測

http://olivier.chapelle.cc/pub/ngdstone.pdf

http://www.slideshare.net/OlivierChapelle/wsdm14

精度向上のテクニック

ブースティング：http://www.slideshare.net/holidayworking/ss-11948523

フィーチャー・ハッシング：http://ja.wikipedia.org/wiki/Feature_Hashing

バンディット・アルゴリズム：　http://www.slideshare.net/greenmidori83/ss-28443892

https://web.stanford.edu/class/msande239/lectures-2011/Lecture%2007%20Targeting%202011.pdf

https://www.kaggle.com/c/criteo-display-ad-challenge/forums/t/10555/3-idiots-solution-libffm

http://olivier.chapelle.cc/pub/ngdstone.pdf

http://ja.wikipedia.org/wiki/Feature_Hashing

http://www.slideshare.net/greenmidori83/ss-28443892

ブラックボックスなアドテクを機械学習で推理してみた short ver

Data & Analytics