new challenges for scalable machine learning in online advertising

16
Copyright © 2015 Criteo New challenges for scalable machine learning in online advertising Olivier Koch Engineering Program Manager, Criteo ICML Online Advertising Systems Workshop June 24, 2016

Upload: olivier-koch

Post on 13-Apr-2017

564 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

New challenges for scalable machine

learning in online advertising

Olivier Koch

Engineering Program Manager, Criteo

ICML Online Advertising Systems Workshop

June 24, 2016

Page 2: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

What we do

2

Advertiser Publisher

Page 3: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning applications at Criteo

• Bidding (2nd price auctions)

• Product recommendation

• Banner look and feel selection

Page 4: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning at Criteo

• Supervised learning using standard regression methods / optimization algorithms (SGD, L-BFGS)

• Distribution on Hadoop (MapReduce, Spark)

• 3B displays / day

• 40 PB of data -- 15,000 servers

• 7 data centers worldwide

Page 5: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

The good news

• New generations of algorithms

• NLP (word embeddings), reinforcement learning, policy learning, deep networks

• Releases of ML infrastructures

• Caffe on Spark, TensorFlow, Torch, PhotonML, GPUs inside clusters

→ strong traction in the academic/industrial community

Page 6: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

The good news (c’ed)

• A lot of data is available

• Interactions with banners : clicks

• Interactions with products/advertisers : sales, baskets, home views, listings, visit history

• New data is coming

• Mobile, cross-device, (offline)

Page 7: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Now what?

Page 8: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Challenges in online advertising 1/3

• The technical debt of large-scale machine learning systems

• AB tests = snapshots. Are we missing long term effects?

• Some models become hard to improve. Are we overfitting or using the wrong metrics?

• We need to deal with a growing number of models – e.g. automate feature engineering

Page 9: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Challenges in online advertising 2/3

• We want to provide a better online advertising experience

• Personalized

• Cross-device

• Long tail (new users, new products)

Page 10: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Challenges in online advertising 3/3

• Credit assignment and incrementality

• Several clicks might be needed to generate a sale

• We should probably optimize a series of bids as opposed to single bids

• What is the optimal credit assignment scheme?

• We optimize what clients give us

• Attributed sales may not be the right target

• Global sales increase are noisy

Page 11: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning to the rescue

• Offline metrics – counterfactual analysis

• Optimal bidding strategies under uncertainty -- reinforcement learning

• Classification/prediction of time series

• Long tail (users, products) -- transfer learning, factorization

• Probabilistic match of devices

Page 12: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning to the rescue

• Offline metrics – counterfactual analysis

• Optimal bidding strategies under uncertainty -- reinforcement learning

• Classification/prediction of time series

• Long tail (users, products) -- transfer learning, factorization

• Probabilistic match of devices

Page 13: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Offline metrics – counterfactual analysis

• Option 1 : run a controlled experiment (AB test)• How would the system behave if I replaced model M by model M*?• Takes time to conclude• Costs money if M* is worse than M (often)• Does not measure long-term effects

• Option 2 : use counter-factual analysis• How would the system have performed if, when the data was collected, we had replaced model M by model M∗?• Requires real-time randomization -- cost/exploration trade-off• Works best when M* is close to M• Trades time for computation and storage• Ignores future users’ and advertisers’ reactions

Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising, Bottou et al.

Page 14: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Optimal bidding strategies

• A user is seen more than 20 times a day on average

• Each action we take has an impact on the user, the advertiser and the competition

• Option 1 : model the environment and bid accordingly• Cannot go beyond the proxy being optimized

• Option 2 : no model, randomized experiments• Hard problem : very high-dimensional state space and very sparse rewards

Page 15: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Conclusions

• Machine learning applies well to online advertising at scale

• New algorithms, new infrastructures and more data are coming

• A number of challenges remain unresolved…

• … come help us solve them!

Page 16: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Thanks! Questions?

[email protected]

Dataset released: http://bit.ly/criteodata