doing awesome things in online advertising using hadoop

13
DOING AWESOME THINGS IN ONLINE DISPLAY ADVERTISING USING HADOOP Success Stories, Lessons Learned, and a Wish List Dr. Jaimie Kwon. Tech Director, Data Mining

Upload: jaimie-kwon

Post on 03-Dec-2014

181 views

Category:

Technology


3 download

DESCRIPTION

We at Advertising.com (a division of AOL Networks) use a dedicated Hadoop cluster to process petabytes of online display advertising data. The data powers customer / audience understanding, predicting look-alike audiences, measuring ad effectiveness, and ad-hoc research. In this talk, we cover a few use cases and success stories and lessons learned.

TRANSCRIPT

Page 1: Doing Awesome Things in Online Advertising Using Hadoop

DOING AWESOME THINGS IN ONLINE

DISPLAY ADVERTISING USING HADOOP

Success Stories, Lessons Learned, and a Wish List

Dr. Jaimie Kwon. Tech Director, Data Mining

Page 2: Doing Awesome Things in Online Advertising Using Hadoop

Massive cross-screen network reaching 600M+ consumers worldwide

Premium programmatic demand side platform

Leading premium video network with 67M+ uniques

Premium programmatic video platform

Branded and content entertainment platform

Branded and content entertainment platform

Branded and content entertainment platform

Premium programmatic supply side platform

Page 3: Doing Awesome Things in Online Advertising Using Hadoop

5Vs IN BIG DATA

• Doesn’t always work well with “volume”… leading to silos. Technical challenge.

VELOCITY• Petabytes are norm. Thanks

Hadoop! Bottleneck and hotspots occurs in unexpected places.

VOLUME• “Where shall clean

metadata be found?” Organizational challenge (culture and process).

VERACITY

• Diverse data source… leading to silos. Engineering resource / architectural challenge

VARIETY• Not to be forgotten.

“Why we fight?”

VALUE

Page 4: Doing Awesome Things in Online Advertising Using Hadoop

IT’S BEEN A GREAT 10 YEARS

(Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 )

Page 5: Doing Awesome Things in Online Advertising Using Hadoop

AOL NETWORKS DATA IN HADOOP

USE CASESAggregates : Easy via Hive

Ad hoc queries : Harder via Pig/HiveUser level analysis : Hardest 1. Customer / audience understanding,

2. Predicting look-alike audiences, 3. Measuring ad effectiveness,4. User time-series analysis,

5. Stream analysis,6. Ad-hoc research,

7. ...

SCALE• > 1 Billion events / day• > 100 million web users

Hundreds of advertisersThousands of ad campaigns

Thousands of pixelsPetabytes of data

Page 6: Doing Awesome Things in Online Advertising Using Hadoop

CHALLENGESVARIETY

• Acquisitions happens• New, diverse data sources

• Speed of ingestion is the key

NEED FOR USER LEVEL ANALYSISAnswering such questions as:

• “What are prominent behavioral segments of those who purchased product A?”

• “What do users do 2-weeks prior to purchasing product B?”

• “What is the likelihood of a user purchasing product C over next week?”

UNSTRUCTURED DATA

Page 7: Doing Awesome Things in Online Advertising Using Hadoop

MAD, MAD, MADMagnetic: “attracting

all the data sources that crop up within an

organization regardless of data quality niceties.”

Agile: “allow analysts to easily ingest, digest,

produce and adapt data at a rapid pace.”

Deep: “... increasingly sophisticated statistical methods

... beyond the rollups and drilldowns of traditional BI. ...

need to see both the forest and the trees in running these

algorithms - they want to study enormous datasets without resorting to samples and

extracts. The modern data warehouse should serve both as a deep data repository and as

a sophisticated algorithmic runtime engine.”

MAD Skills: New Analysis Practices for Big Data (2009, Cohen et al.)

M A D

Page 8: Doing Awesome Things in Online Advertising Using Hadoop

USER PROFILEUSER PROFILE

• Daily user profile is built for all anonymous cookie ids seen on a given

day • Multiple days’ worth of user profile is

assembled via map-side join.• Processing framework is built so map-

side join and other machineries are hidden from researchers and (most)

developers.• Support almost all advanced use cases.

CHOICES WE (ALMOST) HAD:• Flat file on HDFS,

• Pig,• Hive, • Hbase,

• Custom “user profile” • Ended up with user profile approach and never looked back..

• .. so far.

Page 9: Doing Awesome Things in Online Advertising Using Hadoop

USE CASES #1:CUSTOMER UNDERSTANDING

User profile supports AOL Networks’ audience analytics system that answers such questions as:

• “Are very young and old customers better clickers?”o “Yes, but young adult are better purchasers”

• “Are people who saw display advertising more likely to come to the online store?”o “Yes. About twice more likely in particular.”

Page 10: Doing Awesome Things in Online Advertising Using Hadoop

USE CASES #2:LOOKALIKE AUDIENCE MODEL

User profile supports AOL Networks’ Lookalike audience

offering, which let you reach new people who are likely to be

interested in advertiser’s offering due to their similarity to existing

customers.

Predictive Analytics and Optimization

Logistic RegressionNeural NetworksRandom Forest

Gradient Boosting Machine…

VALUE UNSTRUCTURED DATA

Page 11: Doing Awesome Things in Online Advertising Using Hadoop

MORE CHALLENGES...

Cluster Ops

Tuning of Cluster / Jobs

Velocity / real-time: Want more real-time update of the user profile. Hard.

Veracity: Organizational challenge. High-quality metadata.

Good “Data Scientists” specializing in “Big Data” are hard to find.

Page 12: Doing Awesome Things in Online Advertising Using Hadoop

LOOKING FORWARD TO MORE EXCITING DEVELOPMENT

(Taken from http://www.slideshare.net/larsgeorge/hadoop-is-dead-lars-george-bi-data2013 and http://techblog.baghel.com/index.php?itemid=132 )

20232015

Page 13: Doing Awesome Things in Online Advertising Using Hadoop

?