targeted, not tracked: client-side solutions for privacy-friendly behavioral advertising janice tsai...
TRANSCRIPT
TARGETED, NOT TRACKED: CLIENT-SIDE SOLUTIONS FORPRIVACY-FRIENDLY BEHAVIORAL ADVERTISING
Janice Tsai
Misha Bilenko
Matt Richardson
Anonymous User Sees This Ad
Known User Sees A Different Ad
Personalized Advertising Today• User is tracked: history of activity is stored
• On ad platform’s server and/or cookie
• History is processed into profile• Reduced representation for quick lookup• Can also be communicated or sold across parties
• Profile is used for ad targeting
• Total targeting revenue expected $2.6B by 2014 (eMarketer 2011)
• Supported by all major ad platforms
Talk Outline
• Client-side vs. server-side profiles
• Client-only Profiles (CoP): balancing privacy and personalization
• Experiments: client- vs. server-side revenue difference
Personalized Advertising is Ubiquitous• Driven by economics
• Publishers, platforms: CPM rates 2.7x higher [Beales ‘10]• Advertisers: 6x gain in CTR [Yao et al. ‘08]
• What about users? • “It’s a little creepy, especially if you don’t know what’s going on” [NYT ‘11]• What’s going on is complex and misunderstood [McDonald ’10-11]• Ad industry: self-regulation, users can opt out via • Browsers: Do Not Track (FF, IE, Safari), KeepMyOptOuts (Chrome)• Privacy advocates: self-regulation is insufficient• W3C Tracking Protection Working Group• Legislation: multiple bills/hearings in US; European e-Privacy directive
Personalized Advertising Mechanics
• User information drives market efficiency• Users have no knowledge/control of their information• First vs. third-party distinction is increasingly non-trivial
Publisher
Ad PlatformUser
Ad platform
Ad platform
…
…Advertiser
Aggregator
Server-side User Profiles in Advertising
Log storeProfile Update
(t)
Profile store
Personalized response
computation
Ht
p(t+1)
p(t)
Server
(query or url)
Server-side User Profiles in Advertising
Log storeProfile Update
(t)
Profile store
Personalized response
computation
Ht
p(t+1)
p(t)
Server
(query or url)
(ad)
Server-side User Profiles in Advertising
Log storeProfile Update
(t)
Profile store
Personalized response
computation
Ht
p(t+1)
p(t)
Server
(query or url)
(ad)
Problem: No User Control over Data• Users do not know what is stored, where and why
• Use, retention, sharing
• Users cannot edit or delete their behavioral data• Deleting cookies insufficient: re-identification, LBOs, local storage• Opting out ≠ having your data purged
• Most users tracking invasive when asked [McDonald-Cranor’10]• But don’t do much about it: Do Not Track adoption in Firefox: 4-6%
Current “Do Not Track” Proposals• Provide a mechanism for users to prevent being tracked• Existing browser implementations
• HTTP headers, opt-out cookies• Browser contacts server but notifies it that user does not want to be tracked.• User must trust service providers
• Domain blocking / TPL lists• Browser doesn’t send request to certain domains
• Tracking vs. targeting: collection vs. usage• “All or nothing” approach: privacy = no targeting• Undesirables extremes: inefficiency vs. loss of revenue
Client-Side Tracking• Tracking is performed solely on client machine
• User retains control, targeting is still possible• User can delete or edit profile• Services don’t retain user history
• No back-end sharing of user data between companies• Avoid issues around retention policies, deleting all copies, etc.
• Studies indicate users care more about being tracked than about being targeted
Existing Plugin-Based Approaches• Privad, Adnostic, RePRIV• User installs client plugin which collects user data and
communicates with ad network• Difficulties
• Requires user to install plugin• Requires significant changes to existing ad serving infrastructure• Hard to manage click fraud, ad budgets• Bandwidth (e.g., 10x ads sent to client)• Targeting algorithms baked into plugin may slow innovation• Targeting on client = less information than targeting on server
Alternative: Client-Only Profiles (CoP)• Profile stored in cookie on client machine
• Browser sends profile to server upon page request
• Server returns page and updated profile in cookie
• Server does not log user activity
Client-only Profiles
Server
Profile Update
Context c(t)
Local storage
Personalized response
computation
Response
Profile p(t)
Profile p(t+1)
Client
Client-only Profiles
Server
Profile Update
Context c(t)
Local storage
Personalized response
computation
Response
Profile p(t)
Profile p(t+1)
Client
Client-only Profiles+ No plugins (AdNostic, RePRIV, Privad: users install plugins)
+ No major changes to serving infrastructure
+ Targeting server-side (advanced features/algorithms)
+ Profile update server-side (advanced features/algorithms)
- Must trust ad platform to comply with policy and not retain• Debatable proposition for security community…• …but Do Not Track already makes the same assumption
What will it cost compared to server-side tracking?
Comparison of Tracking Approaches
Incremental Profile Updates: Task• How much does incremental update hurt?
• Compare to profiles constructed on server from full history
• May depend on the task (personalizing ads, content, search results)
• Representative task: predicting future ad clicks• Discriminates long-term user interests• Can be used for ad selection, ranking, CTR prediction, auction
• Bid Increments• Advertiser specifies an increment to their bid if the user has the
keyword in their profile
Incremental Profile Updates: Method[Bilenko and Richardson KDD-2011]
• Algorithm based on machine learning
• Features based on behavior frequency/recency, context, etc.
• ML function predicts p(click|keyword) using these features
• Select top-k keywords for profile• Keyword value is incremental utility of ads not covered in profile so far• Leads to a submodular optimization problem• Solved by efficient, accurate approximate algorithm
Incremental Updates: Study• Two months of activity on Bing search engine• 2.4 million users (randomly sampled from total population)• Train predictor using first 6 weeks• Cookie contains
• Profile: Top-k keywords by predicted value• Cache: LRU policy
• Metric• Fraction of future clicks in profile (proportional to revenue gain)
Incremental Updates: Results
• Retains 97-99% of gain vs. server-side tracking• Requires only 20-50 keywords in profile (50 in cache)
10 20 30 40 5050%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
Profile size=10Profile size=20Profile size=30Profile size=40Profile size=50
Cache size
% o
f s
erv
er-
sid
e u
tili
ty
Conclusions• Client-side tracking balances privacy and market efficiency
• Possible approach: CoP, which• Ensures user control over tracking• Requires insignificant change to existing infrastructure• Retains 97+% of revenue gains by ad targeting
• Should Do Not Track distinguish client-and server-side tracking?• 1st vs. 3rd party are increasingly difficult to differentiate
THANKS!
Backup Slide• If I have to trust the server anyway, why not trust it to store
my profile as well?• Trusting not to store is a lower bar than trusting to properly handle
profile
• Storing profile on server = Trusting any team with access to your profile to:• Know the policies• Correctly implement things like opt-out, retention, publication.• Either never copy your history, or ensure your edits/deletions are
propagated through to all copies.• Not to share it with any other team that might not know these things
• Storing profile on client = Trusting just the team that receives the profile to use it and throw it away.
1st vs. 3rd party• Distinction is getting increasingly muddled
• 1st party data collection is becoming pervasive• 3rd party collection can be tightly controlled by advertiser.
Regulatory Interest in Behavioral Advertising
• United States• Federal Trade Commission has proposed a regulatory framework
calling for Do Not Track solutions• Legislation calls for Do Not Track solutions
• US Senate, US House of Representatives, California Legislature
• Europe• Notice and Consent prior to depositing cookies
30
Do Not Track Solutions
HTTP Header- URLs are appended with “X-Do-Not-Track”Blocking Traffic- URI Blacklist- Blocks traffic = Blocks AdsOpt-Out Cookie- Cookie signals Opt-Out intent- Blocks traffic = Blocks Ads
31
Do Not Track Solutions
DNT Solution Apple Safari Google Chrome
Microsoft IE Mozilla Firefox
Blocking Traffic
Opt-Out Cookie
HTTP Header
Do Not Track solutions are built into each browser with the exception of Google Chrome where the Opt-Out cookies are a part of a browser extension.