profiling and behavioural advertising › ~jhh › secsem › 2014 › behavioural... ·...
TRANSCRIPT
Profiling and Behavioural Advertising Security Seminar 2014: Privacy
Maarten Derks & Nick Heijmink
Contents
• What is behavioural advertising
• Customer profiling
• Privacy concerns
• What are the rules/laws
• Do Not Track
• The EU Cookie Law
Privacy enhancing technologies:
• EU Cookie Law issues/improvements
• Do Not Track issues/improvements
• Client-side solutions
• Server/infrastructure-side solutions
Behavioural advertising
• Tracking customers
• Find user interest
• Interest used for targeting purposes
• Tracking
• The collection and aggregation of behavioural data of an user
• Targeting
• The use of this data during ad selection
• Advertisement only shown to persons interested
• Fewer ad impressions are wasted
Example behavioural advertising
Customer profiling
• What is Customer profiling?
• According to Nancy J. King, Pernille Wegener Jessen
“An automatic data processing technique that consists of applying a
‘profile’ to an individual in order to take decisions concerning him or
her; or for analysing or predicting personal preferences, behaviours
and attitudes.”
Customer profiling technical view
In a technical sense, profiling is:
• Computerized method
• Involving data mining from data warehouses,
• Which makes/ should make it possible
• Place individuals in a particular category
• With a certain degree of probability, and with a certain induced error rate
• Take individual decisions relating to them
How do they get this data
• First-party trackers - Analytic
• Third-party trackers - Profiling
• Assigns a machine with a unique identification
number (something like "4c812db2922...") stored
inside a cookie associated with the web browser
First party cookies can become third party cookies
Using multiple websites with same tracking network
Removing cookies?
• 17 percent of Internet users delete cookies on a weekly basis
• 12 percent do so on a monthly basis
• 10 percent make it a daily habit
Mobile tracking
• Generally an individual communication device
• Games/mobile apps, Web browsing, Contact data, Devise information
• Geographic location at a particular time (location based advertisement)
What do they store?
Age, Age range, Date of birth, Education, Exact date of birth, Gender, Marital status, Home ownership,
Own or rent, Estimated income, Exact income, Ethnicity, Presence of children, Number of children, Age
range of children, Age of children, Gender of children, Language preference, Religion, Veteran in
household, Voter party, Professional certificates (teacher, etc.), Education level, Full name, Email
address, City, State, ZIP, ZIP + 4, Home Address, Land-line phone, Social IDs / social media handles
and aliases, Mobile phone number, Carrier, Device type, Email address, Vehicle make, model and
year, VIN, Estimated vehicle value, Vehicle lifestyle indicator, Model and brand affinity, Used vehicles,
Antiques, Apparel (women, men & child), Art, Average direct mail purchase amounts, Museums, Audio
books, Auto parts, auto accessories, Beauty and cosmetics, Bible purchaser, Bird owner, Books,
Estimated income, Estimated household income, Home value, Length of residence, Purchase date,
Purchase price, Purchase amount, Most recent interest rate type, Most recent loan type code, Sales
transactioncode, Most recent lender code, Purchase lender code, Most recent lender name, Purchase
lender name, Fuel source, Loan to value, Purchase interest rate type, Most recent interest rate,
Purchase interest rate, Pool or spa, Home-year built, Air conditioning, Boat ownership, Plane
ownership, Motorcycle ownership, Bankruptcy, Beacon score, Credit score-actual, Certificates of
deposit/ money market funds, Estimated household income ranges, Income producing assets indicator,
Estimated net worth ranges, IRAs, Life insurance, Low-end credit scores, Mutual funds/annuities,
Summarized credit score or modelled credit score by neighborhood, Payday loan purchaser, Number of
credit lines, Tax liens, Card data, Card holder, Frequent credit card user, New retail card holders,
Underbanked or “thin file”, Stocks or bonds, Average online purchase, Average offline purchase, etc
Applications of customer profiling
?
Applications of customer profiling
• Targeted marketing
• Finding new locations for stores
• Analysis of risks and fraud
• Updating price based on interest
How do they create a profile?
• Creating a profile based on raw real world data
• Summarize available, relevant information
• Reduce the information into a set of categories
• Algorithm of the filter differs, most of them secret
Targeting
• Get cookie from user
• Get profile from user using the cookie
• Search in database for correlating advertisement
• Show advertisement
Customer targeting
• Do you think that consumer profiling and targeting should be allowed?
• Should advertisers be able to use profiling to predict that a consumer will
take advantage of a coupon for online gambling when the profile includes
consumers who are likely to be/get addicted to gambling?
• What if weight-loss aids are promoted to consumers in a profile who have a
high probability of having eating disorders, for whom weight-loss aids may
create substantial health risks?
Privacy concerns
Interference with personal data
• What will they do with my data?
• Which data do they collect?
• How long will they store my data?
• Which profile is linked to me?
• Who will see/use my data?
Privacy concerns
Regulatory for behavioural advertising
• All the EU individuals have privacy and data protection under the Data
Protection Directive 95/46
• Article 5:
• Member States shall, within the limits of the provisions of this Chapter,
determine more precisely the conditions under which the processing of
personal data is lawful.
• Personal data should be processed on legitimate processing grounds.
• Collected for specified, explicit and legitimate purposes and not further
processed in a way incompatible with those purposes.
• Processing of data for historical, statistical or scientific purposes shall not
be considered as incompatible.
Personal data may be processed if
• The data subject has unambiguously given his consent.
• Processing is necessary for the performance of a contract to which the data
subject is party or in order to take steps at the request of the data subject prior
to entering into a contract.
• Processing is necessary for compliance with a legal obligation to which the
controller is subject.
• Processing is necessary in order to protect the vital interests of the data subject.
• Processing is necessary for the performance of a task carried out in the public
interest or in the exercise of official authority vested in the controller or in a third
party to whom the data are disclosed.
• Processing is necessary for the purposes of the legitimate interests pursued by
the controller or by the third party or parties to whom the data are disclosed.
Do Not Track
• Webbrowsers
• Plug-ins
• 3 types do not track: - Domain blocking
- Opt-out cookies
- HTTP headers
Do Not Track
• Domain blocking - Blocks contacting to user specified domains.
• Opt-out cookies - Informs the domains that the user does not want to be tracked using cookies.
• HTTP headers - Informs the domains that the user does not want to be tracked using the W3C
introduces DNT-tag in the header.
Disadvantage:
The last two types requires the trust of the user in that the target domain
complies.
• Blocking 1st party cookies: it is very hard to login anywhere
• Blocking 3rd party cookies: no adverse effects to surfing
EU Cookie Law
Privacy legislation that requires websites to get consent from visitors to store
or retrieve any information on a computer, smartphone or tablet
Users should be provided with clear and precise information about the
purposes of cookies.
Set the type of cookie
BREAK
Privacy Enhancing Technologies (PET)
• EU Cookie Law issues/improvements
• Do Not Track issues/improvements
• Client-side solutions
- Plugin-based client side profiling
- Native client-side profiling
• Server/infrastructure-side solutions
EU Cookie Law issues
• Cookies are a low-level mechanism that cannot be easily explained
• Unclear which cookies could be given a pass and which needed to be
explicitly given permission
• No standardized mechanism for seeking permission
• Improvements?
EU Cookie Law issues
• Cookies are a low-level mechanism that cannot be easily explained
• Unclear which cookies could be given a pass and which needed to be
explicitly given permission
• No standardized mechanism for seeking permission
• Improvements:
- Use the term ‘tracking’ rather than talking about cookies
- Use a stricty controlled syntax for summarizing tracking
habits
Example grammar
⟨tracking⟩ ::= ⟨necessary statement⟩* ⟨tracking statement⟩* ⟨excuses⟩
⟨necessary statement⟩ ::=
We record your ⟨what⟩ using ⟨methods⟩+
so you can ⟨benefit⟩.
⟨what⟩ ::= ( login | shopping cart | … )+
⟨tracking statement⟩ ::= ⟨who tracks you⟩ ⟨anonymously or not⟩ using ⟨method⟩+ .
⟨who tracks you⟩ ::= We track you
| ( Our advertisers | ⟨company name⟩ ) may? track you ( on our behalf )?
⟨anonymously or not⟩ ::= anonymously | personally
⟨method⟩ ::= server logs | the cookie ⟨name⟩ | cookies (unless you disable third-party cookies)? | invisible
images | … ⟨excuses⟩ ::= …
Source: http://alleged.org.uk/pdc/2012/07/07.html
Do Not Track issues
• Incorporation of privacy-protecting features in web browsers
• Two categories require the user to trust that the target domain will comply
• None of the categories meet requirements from regulatory framework by
the Federal Trade Commission (FTC)!
• Different balance between ease-of-use, universality and enforceability
• Failed to win the endorsement of advertising industry
• Focus on behavourial advertising, neglects non-advertising tracking
• Improvements?
Client-side solutions
• Goal: reduce or prevent user tracking, while allowing advertising network to
retain all or most of the revenue gains achieved from targeting
• How? By client-side aggregation of personal data
• Major concerns over behavioural advertising include the user’s lack of
control over the data collection and retention
• Allows the user to be targeted while leaving user in possesion of their data
• Strong alternative to binary solutions like Do Not Track
• Two types: - Plugin-based client-side profiling
- Native client-side profiling
Plugin-based client-side profiling
• Makes use of a browser plugin installed on the user’s machine
• Plugin maintains a collection of the user’s browsing and behavioural data
and uses it to facilitate targeting during ad selection
We will discuss three examples:
• Privad
• Adnostic
• RePriv
Plugin-based client-side profiling: Privad
• Developed by the Max Planck Institute for Software Systems (MPI-SWS)
• Goal: complete user privacy
• User behaviour monitored by plugin that stores profile on client machine
• Ad server sends large set of potential advertisements to plugin
• Plugin selects ad to achieve targeting, based on local profile
• Ad impressions and clicks are sent encrypted through third-
party dealer, which anonymizes the source
http://adresearch.mpi-sws.org/
Plugin-based client-side profiling: Privad
Plugin-based client-side profiling: Adnostic
• Developed by Stanford University and New York University
• Browser plugin selects ad based on locally constructed profile
• In contrast to Privad: ad impressions kept hidden, but ad clicks not
• Less vulnerable to click fraud, but reveals targeting attributes of user
• Plugin makes selection out of 10-20 ads sent by ad network
• Information about selection is encrypted and aggregated, occasionally
decrypted by a trusted third party
https://crypto.stanford.edu/adnostic/
Plugin-based client-side profiling: Adnostic
Plugin-based client-side profiling: Downsides
• Users have to install a plugin
• Requires ad platform to comply
• Both plugins make fraud detection difficult
• Increases network traffic and load times
• Advertiser budget constraints: - Estimating when an ad’s budget will expire
- Could result in ads being shown too many/few times
• Both approaches take control over tracking and targeting out of the hands
of the advertising network
RePriv
• Developed by Microsoft Research
• Constructs user profiles from raw browsing data on the client machine
• Sends them to the ad network to facilitate targeting server-side
• Allows the ad network to view user data and perform personalization
• User has option to review sent data, and either approve or disapprove
• Solves difficulties regarding fraud, budgets and innovation
• Downside: reveals user attributes, requires trust
https://research.microsoft.com/en-us/projects/repriv/
Native client-side profiling: Client-only profiles
• Stores behavioural information on the client, but no browser plugin required
• Gives users control over data while allowing platforms to target
advertisements without making significant structural changes to the current
delivery machanisms
• User behaviour maintained in aggregated form, along with cache of raw
recent behaviour in the browser cookie associated with the ad network
• Only record of user behaviour is maintained on the client in the cookie,
leaving the user with the option of deleting their profile any time
Downside:
• Relies on policy compliance by ad networks, not enforceable
Client-only profiles
Nujabes_Featuring_Shing0
2-Luv(sic)_Part_5-VINYL-
FLAC-2012-FrB
Client-only profiles
• Integration with bidding systems
• Use of machine learning for predicting a user’s future interests
• Revenue impact analysis:
Targeting performance comparable to
that of server-side profiling, but
without the need to track user
behavior server-side
Server/infrastructure side solutions
• Goal: minimizing data retention needs while maintaining efficacy of
algorithmic predictions for product placement
• How? By segregation of data into: 1. Identifying Information Component
2. Tracking Information Component
3. Optimization Information Component
• This facilitates anonymous tracking
• Use of inferencing algorithms to select relevant advertisements: Markov
model, generalized regression, user similarity, ...
• Downside: re-identification of anonymous data is possible under certain
circumstances
Server/infrastructure side solutions
Nujabes_Featuring_Shing0
2-Luv(sic)_Part_5-VINYL-
FLAC-2012-FrB
Conclusion
• We know what behavioural advertising is and how it works
• There are laws created for data processing
• Multiple regulations in place, but they are not very effective
• Enforcing and checking policies is difficult
• Client-side solutions available that require minimal changes to ad network
infrastructure (but do require the user to take action)
• Server-side solutions require adoption by ad networks
Questions
?
Literature
• http://www.youronlinechoices.com
• http://www.clickz.com/clickz/news/1691871/study-consumers-delete-cookies-surprising-rate
• Bilenko, M., Richardson, M., & Tsai, J. Y. (2011, July). Targeted, not tracked: Client-side
solutions for privacy-friendly behavioral advertising. In The 11th Privacy Enhancing
Technologies Symposium (PETS 2011).
• Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum, H., & Barocas, S. (2010, February).
Adnostic: Privacy Preserving Targeted Advertising. In NDSS.
• Thaw, D., Gupta, N., & Agrawala, A. “Privacy-Friendly” Design for Online Behavioral
Advertising Systems.
• Pam Dixon and Robert Gellman, The Scoring of America, How Secret Consumer Scores
Threaten Your Privacy and Your Future (2014,april), http://www.worldprivacyforum.org/wp-
content/uploads/2014/04/WPF_Scoring_of_America_April2014_fs.pdf