machine learning on big data for personalized internet advertising

36
M. RECCE 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Machine Learning on Big Data for Personalized Adver<sing

Upload: trieu-nguyen

Post on 12-Jul-2015

325 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Machine learning on big data for personalized Internet advertising

M.  RECCE  

11/18/2011  ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Machine  Learning  on  Big  Data  for  Personalized  Adver<sing  

Page 2: Machine learning on big data for personalized Internet advertising

Adver<sing  has  long  wanted  be?er  algorithms    

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

2  

Half  the  money  I  spend  on  adverBsing  is  wasted;  

the  trouble  is  I  don't  know  which  half.    

 

  John  Wanamaker  “The  Father  of  Modern  AdverBsing”  

“  

”  

Page 3: Machine learning on big data for personalized Internet advertising

•  Internet  adverBsing  (the  business)  

•  Internet  adverBsing  (the  data)  

•  Understanding  consumers    (the  models)  

•  Organizing  for  success  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

3  

Outline  

Page 4: Machine learning on big data for personalized Internet advertising

The  Personalized  Media  Economy  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

4  

Media  is  transiBoning  from  a  “one  size  fits  all”  broadcast  model  to  dynamic  real-­‐Bme  choice  

Online  AdverBsing  Ecosystem  

Page 5: Machine learning on big data for personalized Internet advertising

Globally,    hundreds  of  billions  of  

dollars  of  ad  spend  will  shiY  

Money  Follows  Media  ConsumpBon  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

$30B  opportunity  

?  

5  

Page 6: Machine learning on big data for personalized Internet advertising

•  Media  spend  processes  are  well  established  

•  New  media  channels  lag  unBl  audiences  and  value  can  be  properly  quanBfied  

•  Historically,  digital  audiences  were  poorly  quanBfied  –  StraBfied  sampling  has  been  the  norm  in  media  measurement  for  

decades  –  Bias  and  sampling  error  prevail  

Why  the  Spending  Disparity?  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

6  

Page 7: Machine learning on big data for personalized Internet advertising

•  Launched  September  2006  to  enable  addressable  adverBsing  at  scale  

•  First  we  had  to  fix  audience  measurement  

•  Launched  a  free  service  based  on  direct  measurement  of  media  consumpBon  

•  Use  machine  learning  to  infer  audience  characterisBcs  

Enter  Quantcast  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

7  

Page 8: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

8  

Broad  Par<cipa<on  World’s  Favorite  Audience  Measurement  Service  

Page 9: Machine learning on big data for personalized Internet advertising

•  Massive  expansion  in  number  of  decisions  –  Individuals,  not  whole  audiences  –  Impressions,  not  whole  sites  –  Screens/Bmes/locaBons/……  

•  Decision  Bmeframe  reduced  from  weeks  to  milliseconds  

•  This  problem  can  only  be  solved  algorithmically  

An  Adver<sing  Data  Explosion  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

9  

Page 10: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Data  Rich  Environment  

4  Billion  Cookies  /mo.  observed    

400,000+  Events  /sec  real-­‐<me  transac<ons  

600+  Billion  Events  /mo.  media  consump<on  

WHOLE  LOT  OF  DATA!  

1.3  Billion  Global  Users  

240  Million  U.S.  Users  everyone  

800x  /Person  per  month  avg.  observa<ons  

5  Petabytes  per  day  data  processed  

100+  Million  Des<na<ons  with  QC  tags  

10  

Page 11: Machine learning on big data for personalized Internet advertising

“….let  adver<sers  buy  ads  in  the  milliseconds  between  the  Bme  someone  enters  a  site’s  Web  address  and  the  moment  the  page  appears.  The  technology,  called  real-­‐Bme  bidding,  allows  adver<sers  to  examine  site  visitors  one  by  one  and  bid  to  serve  them  ads  almost  instantly…A  consumer  would  barely  noBce  the  shiY,  except  that  ads  might  seem  more  relevant  to  exactly  what  they  are  shopping  for.”  

             -­‐  New  York  Times,  March  12    

More  relevant  ads,  more  effec<ve  campaigns,  higher  inventory  u<liza<on  &  higher  CPMs  

Rise  of  Real-­‐Time  Audience  Targe<ng  

11   11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Page 12: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

12  

RTB  –  A  Rapid  &  Transforma<onal  Industry  Shib    Quantcast  AucBon  Volume  (UK  &  US)  

1

2

3

4

5

7

Bill

ions

of A

uctio

ns /

Day

Jul ‘11

5.4B

Apr ‘11

3.2B

Oct ‘10

1.2B Feb ‘10 300M

Apr ‘10

400M

Jul ‘10

800M

Jan ‘11

2.0B

6

Sep ‘11

7.2B

Page 13: Machine learning on big data for personalized Internet advertising

Media  Buying  &  Execu<on  is  Changing  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

13  

$200B

2005   Now  Æ  $200B  

Buy  Whole  Sites   Real-­‐Time  Bidding  

TransacBon  

Supply  Porlolio  

100  Publishers  

100’s  of  1000’s  Impressions/Second  

Data/Tools    

Aggregate  Report    

     

Human  Analysis    

 Petascale  CompuBng  +  Machine  Learning  

Page 14: Machine learning on big data for personalized Internet advertising

Data  Mining  Challenges  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

14  

Audience  EsBmaBon    Using  reference  data  from  a  small  number  of  people  and  a  small  number  of  web  sites  infer  the  demographics/anributes  of  the  audience  of  all  sites.  

User  EsBmaBon    Using  media  consumpBon  records  and  audience  esBmates,  determine  the  characterisBcs  of  an  Internet  user  across  arbitrary  dimensions.  

Lookalike  SelecBon    From  the  behavior  of  a  small  number  of  buyers  of  a  product,  determine  the    set  of  people  who  will  buy  it  next.    

Live  Traffic  Modeling    Compute  the  value  for  showing  an  adverBsement  to  a  user  as  a  funcBon  of  the  user,  adverBsing  environment,  Bme  of  day  etc.  

Page 15: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

15  

Quantcast  Lookalikes  for  Marketers  RevoluBonary  Ad  TargeBng  for  Performance  and  Brand  

1.  Understand  marketer’s  BEST  CUSTOMERS  with  Quantcast  Measurement  

2.  Isolate  DISTINCTIVE  INTERESTS  

3.  Find  MILLIONS  OF  LOOKALIKES  

4.  Reach  them  ANYWHERE  

PERFORMANCE  LOOKALIKES  •  Quantcast  technology  conBnually  opBmizes  real-­‐Bme  media  for  adverBser  

BRAND  LOOKALIKES  •  Buy  custom  audiences  from  trusted  media  partners  

Your Site Traffic

Page 16: Machine learning on big data for personalized Internet advertising

•  Given  an  archetype  group  of  users,  find  the  feature  set  that  best  separates  them  from  their  complement  

•  Features  can  be  posiBve  or  negaBve  indicators  of  content  relevance  

•  Find  more  that  look  like  them  

Lookalike  Selec<on  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

16  

Page 17: Machine learning on big data for personalized Internet advertising

•  Math  compeBBon  

•  Largest  number  of  “conversions”  (purchasers)  during  contest  “wins”  

•  Leverage  informaBon  on  prior  purchasers  to  find  more  

•  Decide  how  to  compete  

 

•  Bring  mathemaBcians  

•  More  data  on  each  converter  

•  Management  by  metrics  

•  Know  what  the  compeBtors  are  doing  

Problem  Statement  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

17  

Page 18: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Lookalike  Mass-­‐Produc<on  Pipeline  

Model

500 TB

Scoring

Trained Models

1000s of Concurrent Models

10M Potential Converters 1.3 Billion

Internet Users Multi PB

20 TB / Day

Training 10,000 Converters

Model Configuration

18  

Page 19: Machine learning on big data for personalized Internet advertising

Lookalikes  Iden<fy  Consumers  that  Will  Take  Ac<on  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

 Iden<fy  Posi<ve  &  Nega<ve  indicators  of  purchase  

           Posi<ve  

           Nega<ve  

 

4.  

Consumers  who  purchased  

product  

 Start  with  consumers  who  purchased  1.  

 Select  consumers  who  didn’t  purchase  2.  

 Evaluate  world’s  largest  database  of  human  interests  3.  

 If  a  new  consumer  looks  more  like    a  purchaser  than  a  non-­‐purchaser,  they’re  a  Lookalike  

5.  

days  -­‐80   -­‐20  -­‐40  -­‐60  

0  250  

500  

1000  

750  

0  

Consumers  who  did  not  

purchase  product  

days -80 -20 -40 -60

0 25

0 50

0 10

00

750

0

19  

Page 20: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

20  

Wide  Range  of  Ac<vity  Websites,  keywords,  geo-­‐locaBon,  ads  and  more  

Conversion  Event  

Page 21: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

21  

RTLAL  Bidding  Architecture  

Model  DefiniBon   Pixel  Data  Real  Time    Ad  Exchange  

Model  Training    and  Scoring   Bidding  AucBon  Mgmt  

Page 22: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

AcBvity  Level  VariaBons  

22  

Page 23: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Cookie  DeleBon  Rates  

23  

Page 24: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Media  consumpBon  is  non-­‐staBonary  

13:00   13:30   14:00   14:30   15:00   15:30   16:00   16:30   17:00   17:30   18:00   18:30   19:00  

‘Michael  Jackson’  Media  ConsumpBon  June  25,  2009  

Pages  consumed  per  minute  

24  

Page 25: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

25  

Choose  the  Right  Objec<ve!  

Clicks  don’t  always  lead  to  conversions    The  right  metric  is  criBcal!  

Indexed  Click  Vs.  Conversion  Rates  

Page 26: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

26  

Machines  High  Performance  Plalorm  

450,000  /  Second   Real-­‐Bme  events  

5PB  /  Day  Processing  throughput  

MulBple  Global  Datacenters  Ultra-­‐high  availability  with  advanced  traffic  management  

Page 27: Machine learning on big data for personalized Internet advertising

Collabora<on  

•  Regular  brainstorming  

•  Group  review  meeBngs  

•  Shared  wiki  environment  

•  Team  goals    

Independence  

•  Everyone  free  to  implement  their  own  ideas  

•  Improved  models  

•  Bener  metrics  

•  VisualizaBon  methods,  etc.  

Math  Team  Environment  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

27  

Page 28: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

28  

Measuring  Lib  –  ROC  

Page 29: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

29  

Cumula<ve  Lib  

Page 30: Machine learning on big data for personalized Internet advertising

Learning  ∝  experimentaBon  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

2  Days   Mins  

6  Hours  

New  model  development  

New  model  in  producBon  

To  process  100TB  with  first  MapReduce  job  

Hours  Live  performance  assessment  

2  Weeks   To  influence  billions  of  real-­‐Bme  decisions  every  day  and  millions  of  dollars  of  adverBsing  spend    

30  

Page 31: Machine learning on big data for personalized Internet advertising

Technology  Maners  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Leaders  will  be  world-­‐class  in  every  discipline,  and  will  operate  all  as  a  fully  integrated  whole.  

Machine  Learning  &  OpBmizaBon  

Comprehensive  Coherent  Data  

Petascale  Big-­‐Data  CompuBng  

Real-­‐Time  Tech  Mastery  

31  

Page 32: Machine learning on big data for personalized Internet advertising

If  you  have  all  that  then....  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Having  more  Data  really  maners.  

32  

Page 33: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

33  

Numerous  Open  Challenges  

•  Dealing  with  sparsity  

•  Feature  selecBon  

•  Real-­‐Bme  scoring  and  bidding  

•  ‘True’  performance  &  anribuBon  modeling  

•  LiY,  liY  and  more  liY!  

•  Handling  100,000’s  of  concurrent  models  

Page 34: Machine learning on big data for personalized Internet advertising

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

Summary  

•  Digital  adverBsing  is  a  vast  analyBcal  environment  –  Enormous  data  volumes  –  Rich  behaviors  –  ObjecBve  performance  metrics  

•  MarkeBng  will  be  transformed  by  computaBonal  approaches  

•  Hundreds  of  billions  of  dollars  of  spend  are  at  stake  

34  

Page 35: Machine learning on big data for personalized Internet advertising

Quantcast  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

35  

Page 36: Machine learning on big data for personalized Internet advertising

Contact:  [email protected]  

11/18/2011   ©  2011  Quantcast.  All  Rights  Reserved                  QCon  

36