how did you know this ad will be relevant for me?!
DESCRIPTION
Predicting the most relevant ad at any point in time for every individual is how Rocket Fuel optimizes ROI for an advertiser. One of the factors influencing this prediction is a consumer's online interactions and behavioral profile. With more than 45 billion interactions being processed daily, this data runs into several Petabytes in our Hadoop warehouse. Running machine-learning algorithms and Artificial Intelligence on this vast scale requires many practical issues to be addressed. First, behavioral patterns are shortlived, so to accurately reflect the tendencies of a consumer, we need to curate and refresh his or her profiles as quickly as possible while avoiding multiple scans over the raw data and dealing with issues like transient system outages. Second, we must address the difficulty of building models utilizing behavioral profiles without overwhelming our Hadoop cluster. At this scale, frequent refreshes of several models can place an undue burden on even a thousand-node cluster. In this talk, we will dive into (a) the practical challenges involved in designing a highly scalable and efficient solution to build behavioral profiles using Hadoop framework and (b) techniques for ensuring reliability and availability of mission critical machine learning pipelines.TRANSCRIPT
![Page 1: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/1.jpg)
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Scale - How did we know that this Ad was relevant for you ?
Savin GoyalSivasankaran Chandrasekar
![Page 2: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/2.jpg)
Proprietary & Confidential. Copyright © 2014.Proprietary & Confidential. Copyright © 2014.
ADVERTISER
ROCKET FUEL
200+RTB
advertisingsupply
partners
50+ MnWebsites
50+ BnDaily impressions
3B WW CONSUMERS
100,000+ DEVICES
![Page 3: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/3.jpg)
Proprietary & Confidential. Copyright © 2014.
Exchanges
AdExchange
Rocket Fuel Platform
Auto Optimization
Real-Time Bidding
Agencies
Data Partners
Display Advertising Ecosystem
![Page 4: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/4.jpg)
Proprietary & Confidential. Copyright © 2014.
Bid on Ad
User Data
Bid Request
Rocket Fuel Winning AdAd Request
Ad Served to User
Page RequestWeb Browser
Rocket Fuel Platform
Smart Ad Servers
Response Prediction
Models
1
8
2 7
Calculate Propensity Score
5User Engagement Recorded
9 User Engages with Ad
Publishers
Refresh learning
Campaign & Audience
Data
4
Qualify Campaign
10
3
6
Data Partners
Exchange Partners
Programmatic Buying
![Page 5: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/5.jpg)
Proprietary & Confidential. Copyright © 2014.
$2.38965$0.6782$1.7234
$0.09$1.78964$1.6782$1.7234$0.809$2.421.25
$2.11$1.26
$2.178$2.056$0.809$2.421.25
$2.11$1.26$2.78$1.56
$1.809$2.421.25
$2.11$1.26$2.78$0.56$2.421.25
$2.11$1.26$2.78
$0.756$0.809$2.421.25
$2.11$1.26$2.78
$1.256$1.809$2.421.25
$2.11$1.26$2.78
$0.586$2.009
1.25$2.11$1.26$2.78$1.56
$0.00
Site/PageGeo/WeatherTime of DayBrand AffinityUser
[ + ][ + ]
Real Time Auction
![Page 6: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/6.jpg)
Proprietary & Confidential. Copyright © 2014.
Goal:Leads& sales
Goal:Coupondownloads
Goal:Brandawareness
Site/PageGeo/WeatherTime of DayBrand AffinityDemo
Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-marketBehaviorResponse
Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse
X
Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse
+100+40-20+20+15+10+40+35
+9.7%
+40-70-20+10+15-25-40
-18+0.7
%
+10-10-20+20+10-35-25+10
+1.4% X✓
Real Time Auction
![Page 7: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/7.jpg)
Proprietary & Confidential. Copyright © 2014.
Scalable Predictive Models
Age/Gender
Occupation
IncomeEthnicity
Purchase Intent
OnlinePurchases
OfflinePurchases
BrowsingBehavior
Site Actions
Zip CodeCity/DMA
Search Sites
SearchCategories
Recency
Search Keywords
Web Site/Page
Referral URL
Site Category
Bizographics
Social
Interests Lifestyle
Positive Lift
Marginal Impact
Negative Lift
-7
+17
X
-2
+8
+14
X
-9
-13
-12
X
+19
+13
+11
X
+11
X
XX
+25
+6
X
-7 +17
-2
+28
X
+11
X
X
-9
+14
+17 +19
+8 +11
X
X
-9
+17
-23
+6
X
+17
-7
X
-2
-13
-12
X
+13
+6
+11
XX
X-9 X
+17
X
+19
+8
+14
+18
-23
+17
-12
+11
-9
+8 +14X
+11
-13
-12
+13
+11
X
X
-7
+17 +8
+18X
+11X -12-10
+6
+14
X
+8
+11-10+13
+28 +6
+13+19
X
+8
+11
-10
+13
-12
+17
X
-7
+8
X
Automated Feature Selection
▪ Infinite number of models
▪ Determine perfect model size
▪ Balance past data fit
and future generalization
Learn-Test-Refine
▪ Automatically learn from
each response
▪ Cross-validate - A / B testing
infrastructure
▪ Training pipeline
![Page 8: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/8.jpg)
Proprietary & Confidential. Copyright © 2014.
Throughput
![Page 9: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/9.jpg)
Proprietary & Confidential. Copyright © 2014.
Rocket Fuel Scale
▪ 34,474 CPU Processor Cores▪ 2655 servers▪ 187.4 Teraflops of computing
▪ 188 Terabytes of memory▪ 13X the memory of Jeopardy-
winning IBM Watson
▪ 42 Petabytes of storage▪ 106X the data volume of entire
Library of Congress
![Page 10: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/10.jpg)
Proprietary & Confidential. Copyright © 2014.
200 Servers 1400 Servers
1 Year
5 PB
41 PB8x
Data Warehouse Growth
![Page 11: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/11.jpg)
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting
![Page 12: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/12.jpg)
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting
▪ Leverage online activities on the web to learn about user’s ▪ Long Term Interests
▪ User is interested in luxury cars▪ Short Term Interests
▪ User is looking for a pizza right now
▪ Expand user set beyond retargeting▪ Explore v/s Exploit
▪ Identify relevant users even if they have never been targeted previously
![Page 13: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/13.jpg)
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
Label Data
Train Model
Back Test
Calibrate
TrainingEvents
Pixel Stream Ad Logs
BT Features (HBase)
Feature Generation
Score Profiles
Profile Generation
Scoring
Ad Serving Data Centers Model
![Page 14: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/14.jpg)
Proprietary & Confidential. Copyright © 2014.
Hadoop/HBase @ Rocket Fuel
▪ Cluster Highlights▪ 650+ Slaves (64 GB + 12 *3 TB)▪ 20 PB Storage▪ HA Name Node Set Up▪ 9k Map Slots + 5.5k Reduce Slots▪ Co-located to run HBase for offline processing
▪ HBase 0.94.15▪ 5 Node ZooKeeper quorum▪ Monitoring with OpenTSDB▪ Dual Master Setup
![Page 15: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/15.jpg)
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
bmw.com 11:23
Cars 11:23
pizzahut.com 11:26
Food 11:26
honda.com 11:27
Cars 11:2730 minutes
honda.com
11:27 Recent 6 hours: 5
Between 6 and 12 hours: 3
Between 12 hours and …
Food 11:26 Recent 6 hours: 2
Between 6 and 12 hours: 7
Between 12 hours and …
Read events of last N days
Recency
Frequency
Others..Behavioral Targeting Profile
11:23 11:26 11:27
![Page 16: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/16.jpg)
Proprietary & Confidential. Copyright © 2014.
HBase Data Model
11:23ABCD06EFG
2014060416:site:bmw.com 2014060416:category:food
11:26
row_key: user_id
Single Column Family “u”
Column Qualifier:<date><hour>:<type>:<value>
Cell Value: [Protobuf]Most recent timestamp, Event details relative to timestamp
Event details relative to 11:23 Event details relative to 11:26
• Efficient look up for a given user
• Access range of events by event date, hour and type
![Page 17: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/17.jpg)
Proprietary & Confidential. Copyright © 2014.
![Page 18: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/18.jpg)
Proprietary & Confidential. Copyright © 2014.
Key Challenges
User Profile Freshness Scaling Issues Pipeline Failures
![Page 19: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/19.jpg)
Proprietary & Confidential. Copyright © 2014.
User Profile Freshness
▪ Strict latency requirements▪ Recent activity much better predictor
Solutions - ▪ Staggered Pipelines▪ Real Time Behavioral Targeting
![Page 20: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/20.jpg)
Proprietary & Confidential. Copyright © 2014.
Staggered Pipelines
Extract Score Filter Upload
Extract Score Filter UploadSource Data
Extract Score Filter Upload
Extract Score Filter Upload
Extract Score Filter Upload
![Page 21: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/21.jpg)
Proprietary & Confidential. Copyright © 2014.
Real Time Behavioral Targeting
![Page 22: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/22.jpg)
Proprietary & Confidential. Copyright © 2014.
Batched Profile
Blackbird – HBase instance tuned for 2ms latencies
Refreshed every N hours
Real Time Behavioral Targeting
Offline BT Pipeline
BT Profile
Ad Servers Merge Profiles
Logs
Blackbird
Online Profile
Record events for users in real time
Request
Response
![Page 23: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/23.jpg)
Proprietary & Confidential. Copyright © 2014.
Batched Updates vs. Real Time Updates
Event Granularity Aggregated over several hours/days
Raw recorded events appended for recent
N hours
Processing Load Requires minimal CPU processing
Needs aggregation on-the-fly
Disk FootprintCompact
representation captures several days
Strict limits to ensure read times are
acceptable
Coverage All interactions Only interactions at a data center
▪ Real Time Profile updated in milliseconds
▪ Batched Profile refreshed every N hours
Batched Profile Real Time Profile
![Page 24: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/24.jpg)
Proprietary & Confidential. Copyright © 2014.
Scaling Issues
▪ 3X growth in events processed/year▪ First Party Data▪ App Interactions▪ Geo-location Data▪ …
▪ Case Studies▪ HBase Region Hot-spotting▪ Network Bandwidth Troubles
![Page 25: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/25.jpg)
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot Spotting
![Page 26: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/26.jpg)
Proprietary & Confidential. Copyright © 2014.
HBase Region
HBase Region Hot-spotting
High Write Load
HBase Region
HBase Region
Region Split (painful!)
Some users more active than othersNo control on user id’s generated
Still problematic
Non-uniform
distribution!
![Page 27: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/27.jpg)
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
▪ Uneven write-load distribution▪ Non-Uniform Row Key Distribution
▪ Salt row key’s to ensure uniform distribution▪ Fixed length hashed prefix▪
Murmur hash based prefix
Original User ID
▪ Uniform pre-splits
![Page 28: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/28.jpg)
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
▪ Don’t stop at salting▪ Map input splits configured for region boundaries
Region 1\x03\x85\x1E\xB8ZZZZZZ
Region 2\x07\x5C\xF5\xC2928ZZ
Region m\xFF\xAE\x14\xE1Z28ZZ
12345571234568123457912345831234594
..
..
..
..ZZAHT654ZZZGT934ZZZZNGA2ZZZZKLO1
Key Partitioner
‘k’ splits ‘m’ regions‘m’ splits
\x01\x85\x1E\xB811ZKL1\x01\x86\x1E\xB8129542
..\x03\x85\x1E\xB8ZZZKL1
\x05\x35\x9E\x18087KL1\x06\x86\x1E\xB8AHV24
..\x07\x5C\xF5\xC16534Z
\xEB\x27\x92\x1508RKL1\xFE\x86\x1E\xB8AHV24
..\xFF\xAE\x14\x126534Z
![Page 29: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/29.jpg)
Proprietary & Confidential. Copyright © 2014.
HBase Key Partitioner
▪ As many splits as regions to maximize parallelism
▪ Key Partitioner (MR) – ▪ Reads region boundaries of HBase table▪ Salts and sorts row key accordingly▪ Multiple Output Format to optimize reduce phase▪ Each generated split file corresponds to a single region
▪ Drastically reduces read latencies
![Page 30: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/30.jpg)
Proprietary & Confidential. Copyright © 2014.
Network Bandwidth Troubles
![Page 31: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/31.jpg)
Proprietary & Confidential. Copyright © 2014.
Data Center Expansion
![Page 32: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/32.jpg)
Proprietary & Confidential. Copyright © 2014.
Network Bandwidth Constraints
▪ Consistently overshot bandwidth limit during uploads▪ All sorts of delays (Redis, MySQL, Blackbird…)▪ Bidding hampered
![Page 33: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/33.jpg)
Proprietary & Confidential. Copyright © 2014.
Solutions
▪ Intelligent storage – protobufs everywhere
▪ Throttle writes
▪ Geo-splitting
![Page 34: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/34.jpg)
Proprietary & Confidential. Copyright © 2014.
Geo Splitting
![Page 35: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/35.jpg)
Proprietary & Confidential. Copyright © 2014.
Geo-splitting
▪ Tag user’s location history & predict future data center visits
▪ ⨍(dc, geo_history, bt_profile)
▪ A separate workflow periodically generates geo-split rules:▪ Clusters users & analyzes migration patterns▪ Ensures maximal look-up coverage of profiles▪ Minimizes total number of profiles stored
▪ Ensures efficient use of resources, with minimal impact on perf
![Page 36: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/36.jpg)
Proprietary & Confidential. Copyright © 2014.
Geo-splitting
Label Data
Train Model
Back Test
Calibrate
TrainingEvents
Pixel Stream Ad Logs
BT Features (HBase)
Feature Generation
Score Profiles
Profile Generation
Scoring
Ad Serving Data Centers Model
Cluster Users
Analyze Patterns
Generate Rules
Geo-split
![Page 37: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/37.jpg)
Proprietary & Confidential. Copyright © 2014.
![Page 38: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/38.jpg)
Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
▪ Break pipeline into short payloads▪ Fail fast, recover fast!▪ Actionable alerts, cut down noise
![Page 39: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/39.jpg)
Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
▪ Materialize data as frequently as possible▪ Cross system fault tolerance▪ Idempotency
▪ Backfill at EOD to plug holes if needed
![Page 40: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/40.jpg)
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
![Page 41: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/41.jpg)
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
![Page 42: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/42.jpg)
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
![Page 43: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/43.jpg)
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
![Page 44: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/44.jpg)
Proprietary & Confidential. Copyright © 2014.
We Are Hiring!
![Page 45: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/45.jpg)
Proprietary & Confidential. Copyright © 2014.
![Page 46: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/46.jpg)
Proprietary & Confidential. Copyright © 2014.
Questions ?
Thank You!
Sivasankaran [email protected]
Savin [email protected]
![Page 47: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/47.jpg)
Proprietary & Confidential. Copyright © 2014.
We are hiring! (as always)
http://rocketfuel.com/careers
[email protected]@rocketfuel.com
![Page 48: How did you know this Ad will be relevant for me?!](https://reader036.vdocuments.net/reader036/viewer/2022062513/554d10f8b4c905d4568b50fb/html5/thumbnails/48.jpg)
Proprietary & Confidential. Copyright © 2014.