i always feel like somebody’s watching me …i always feel like somebody’s watching me measuring...

12
I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid [email protected] Jakub Mikians Universitat Polytechnic Catalunya [email protected] Ruben Cuevas Univ. Carlos III de Madrid [email protected] Vijay Erramilli Guavus [email protected] Nikolaos Laoutaris Telefonica Research [email protected] ABSTRACT Online Behavioural targeted Advertising (OBA) has risen in prominence as a method to increase the effectiveness of online advertising. OBA operates by associating tags or la- bels to users based on their online activity and then using these labels to target them. This rise has been accompa- nied by privacy concerns from researchers, regulators and the press. In this paper, we present a novel methodology for measuring and understanding OBA in the online advertising market. We rely on training artificial online personas rep- resenting behavioural traits like ‘cooking’, ‘movies’, ‘motor sports’, etc. and build a measurement system that is auto- mated, scalable and supports testing of multiple configura- tions. We observe that OBA is a frequent practice and notice that categories valued more by advertisers are more intensely targeted. In addition, we provide evidences showing that the advertising market targets sensitive topics (e.g, religion or health) despite the existence of regulation that bans such practices. We also compare the volume of OBA advertising for our personas in two different geographical locations (US and Spain) and see little geographic bias in terms of inten- sity of OBA targeting. Finally, we check for targeting with do-not-track (DNT) enabled and discovered that DNT is not yet enforced in the web. 1. INTRODUCTION Business models around personal information, that include monetizing personal information via Internet advertising and e-commerce [30], are behind most free Web services. Information about consumers browsing for products and services is collected, e.g., using track- ing cookies, for the purpose of developing tailored ad- vertising and e-marketing offerings (coupons, promo- tions, recommendations, etc.). While this can be be- neficial for driving web innovation, companies, and con- sumers alike, it also raises several concerns around its privacy implications. There is a fine line between what consumers value and would like to use, and what they consider to be overly intrusive. Crossing this line can in- duce users to employ blocking software for cookies and advertisements [1, 4, 8, 10], or lead to strict regulatory interventions. Indeed, this economics around personal information has all the characteristics of a “Tragedy of the Commons” (see Hardin [19]) in which consumer pri- vacy and trust towards the web and its business models is a shared commons that can be over-harvested to the point of destruction. The discussion about privacy red-lines has just started 1 and is not expected to conclude any time soon. Still, certain tactics, have already gained a taboo status from consumers and regulators, e.g., price discrimination in e-commerce [30, 31, 18, 32]. Online Behavioural tar- geted Advertising (OBA, Sec. 2) on sensitive categories like sexual orientation, health, political beliefs etc. [34], or tricks to evade privacy protection mechanisms, like Do-Not-Track signals, are additional tactics that border the tolerance of most users and regulators. The objec- tive of this work is to build a reliable methodology for detecting, quantifying and characterizing OBA in dis- play advertising on the Web and then use it to check for controversial practices. Challenges in detecting Online Behavioural Ad- vertising: The technologies and the ecosystem for de- livering targeted advertising is truly mind boggling, in- volving different types of entities, including Aggrega- tors, Data Brokers, Ad Exchanges, Ad Networks, etc., that might conduct a series of complex online auctions to select the advertisement that a user gets to see upon landing on a webpage (see [37] for a tutorial and survey of relevant technologies). Furthermore, targeting can be driven by other aspects e.g., location, gender, age group, that have nothing to do with specific behavioural traits that users deem as sensitive in terms of privacy, or it 1 FTC released in May 2014 a report entitled “Data Bro- kers – A Call for Transparency and Accountability”, whereas the same year the US Senate passed the “The Data Broker Accountability and Transparency Act (DATA Act)”. 1 arXiv:1411.5281v2 [cs.CY] 12 Jun 2015

Upload: others

Post on 17-Jun-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

I Always Feel Like Somebody’s Watching MeMeasuring Online Behavioural Advertising

Juan Miguel CarrascosaUniv. Carlos III de Madrid

[email protected]

Jakub MikiansUniversitat Polytechnic

[email protected]

Ruben CuevasUniv. Carlos III de Madrid

[email protected]

Vijay ErramilliGuavus

[email protected]

Nikolaos LaoutarisTelefonica Research

[email protected]

ABSTRACTOnline Behavioural targeted Advertising (OBA) has risenin prominence as a method to increase the effectiveness ofonline advertising. OBA operates by associating tags or la-bels to users based on their online activity and then usingthese labels to target them. This rise has been accompa-nied by privacy concerns from researchers, regulators andthe press. In this paper, we present a novel methodology formeasuring and understanding OBA in the online advertisingmarket. We rely on training artificial online personas rep-resenting behavioural traits like ‘cooking’, ‘movies’, ‘motorsports’, etc. and build a measurement system that is auto-mated, scalable and supports testing of multiple configura-tions. We observe that OBA is a frequent practice and noticethat categories valued more by advertisers are more intenselytargeted. In addition, we provide evidences showing thatthe advertising market targets sensitive topics (e.g, religionor health) despite the existence of regulation that bans suchpractices. We also compare the volume of OBA advertisingfor our personas in two different geographical locations (USand Spain) and see little geographic bias in terms of inten-sity of OBA targeting. Finally, we check for targeting withdo-not-track (DNT) enabled and discovered that DNT is notyet enforced in the web.

1. INTRODUCTIONBusiness models around personal information, that

include monetizing personal information via Internetadvertising and e-commerce [30], are behind most freeWeb services. Information about consumers browsingfor products and services is collected, e.g., using track-ing cookies, for the purpose of developing tailored ad-vertising and e-marketing offerings (coupons, promo-tions, recommendations, etc.). While this can be be-neficial for driving web innovation, companies, and con-sumers alike, it also raises several concerns around itsprivacy implications. There is a fine line between whatconsumers value and would like to use, and what they

consider to be overly intrusive. Crossing this line can in-duce users to employ blocking software for cookies andadvertisements [1, 4, 8, 10], or lead to strict regulatoryinterventions. Indeed, this economics around personalinformation has all the characteristics of a “Tragedy ofthe Commons” (see Hardin [19]) in which consumer pri-vacy and trust towards the web and its business modelsis a shared commons that can be over-harvested to thepoint of destruction.

The discussion about privacy red-lines has just started1

and is not expected to conclude any time soon. Still,certain tactics, have already gained a taboo status fromconsumers and regulators, e.g., price discrimination ine-commerce [30, 31, 18, 32]. Online Behavioural tar-geted Advertising (OBA, Sec. 2) on sensitive categorieslike sexual orientation, health, political beliefs etc. [34],or tricks to evade privacy protection mechanisms, likeDo-Not-Track signals, are additional tactics that borderthe tolerance of most users and regulators. The objec-tive of this work is to build a reliable methodology fordetecting, quantifying and characterizing OBA in dis-play advertising on the Web and then use it to checkfor controversial practices.

Challenges in detecting Online Behavioural Ad-vertising: The technologies and the ecosystem for de-livering targeted advertising is truly mind boggling, in-volving different types of entities, including Aggrega-tors, Data Brokers, Ad Exchanges, Ad Networks, etc.,that might conduct a series of complex online auctionsto select the advertisement that a user gets to see uponlanding on a webpage (see [37] for a tutorial and surveyof relevant technologies). Furthermore, targeting can bedriven by other aspects e.g., location, gender, age group,that have nothing to do with specific behavioural traitsthat users deem as sensitive in terms of privacy, or it

1FTC released in May 2014 a report entitled “Data Bro-kers – A Call for Transparency and Accountability”, whereasthe same year the US Senate passed the “The Data BrokerAccountability and Transparency Act (DATA Act)”.

1

arX

iv:1

411.

5281

v2 [

cs.C

Y]

12

Jun

2015

Page 2: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

can be due to “re-targetting” [14, 20, 24] from previ-ously visited sites. Distinguishing between the differenttypes of advertising is a major challenge towards deve-loping a robust detection technique for OBA. On a yetdeeper level, behaviours, interests/types, and relevantmetrics have no obvious or unique definition that can beused for practical detection. It is non-trivial to unearththe relative importance of different interests or char-acteristics that can be used for targeting purposes oreven define them. Last, even if definitional issues wereresolved, how would one obtain the necessary datasetsand automate the process of detecting OBA at scale?

Our contribution: The main contribution of our workis the development of an extensive methodology for de-tecting and characterizing OBA at scale, which allowsus to answer essential questions: (i) How frequently isOBA used in online advertising?; (ii) Does OBA tar-get users differently based on their profiles?; (iii) IsOBA applied to sensitive topics?; (iv) Is OBA morepronounced in certain geographic regions compared toothers?; (v) Do privacy configurations, such as Do-Not-Track, have any impact on OBA?.

Our methodology addresses all above challenges by 1)employing various filters to distinguish interest-basedtargeting from other forms of advertising, 2) examin-ing several alternative metrics to quantify the extentof OBA advertising, 3) relying on multiple independentsources to draw keywords and tags for the purpose ofdefining different interest types and searching for OBAaround them, 4) allowing different geographical and pri-vacy configurations. Our work combines all the aboveto present a much more complete methodology for OBAdetection compared to very limited work existing in thearea that has focused on particular special cases over thespectrum of alternatives that we consider (see Sec. 6 forrelated work).

A second contribution of our work is the implementa-tion and experimental application of our methodology.We have conducted extensive experiments for 72 interest-based personas (e.g., ‘motorcycles’, ‘cooking’, ‘movies’,‘poetry’ or ‘dating’) including typical privacy-sensitiveprofiles (e.g., ‘AIDS & HIV’, ‘left-wing politics’ or ‘hin-duism’), involving 3 tagging sources and 3 different fil-ters. For each experiment we run 310 requests (on aver-age) to 5 different context free “test” websites to gathermore than 3.5M ads. Having conducted more than 2.9Kexperiments combining alternative interest definitions,geographical locations, privacy configurations, metrics,filters and sources of keywords to characterize OBA, weobserve the following:

(1) OBA is a common practice, 88% of the analyzedpersonas get targeted ads associated to all the keywordsthat define their behavioural trait. Moreover, half ofthe analyzed personas receive between 26-62% of adsassociated to OBA.

(2) The level of OBA attracted by different personasshows a strong correlation (0.4) with the value of thosepersonas in the online advertising market (estimated bythe CPC suggested bid for each persona).(3) We provide strong evidences that show that the on-line advertising market targets behavioural traits asso-ciated to sensitive topics related to health, politics orsexual orientation. Such tracking is illegal in severalcountries [7]. Specifically, 10 to 40% of the ads shownto half of the 21 personas configured with a sensitivebehavioural trait correspond to OBA ads.(4) We repeat our experiments in both US and Spainand do not observe any significant geographical bias inthe utilization of OBA. Indeed, the median differencein the fraction of observed OBA ads by the consideredpersonas in US and Spain is 2.5%.(5) We repeat our experiments by having first set theDo-Not-Track (DNT) flag on our browser and do not ob-serve any remarkable difference in the amount of OBAreceived with and without DNT enabled. This lead usto conclude that support for DNT has not yet been im-plemented by most ad networks and sites.

Our intention with this work is to pave the way fordeveloping a robust and scalable methodology and sup-porting toolsets for detecting interest-based targeting.By doing so we hope to improve the transparency aroundthis important issue and protect the advertising ecosys-tem from the aforementioned Tragedy of the Commons.

The rest of the paper is organized as follows: In Sec. 2we describe what Online Behavioural Advertising is.Sec. 3 describes the proposed methodology to unveiland measure the representativeness of OBA. Using thismethodology we have implemented a real system thatis described and evaluated in Sec. 4. Sec. 5 presents theresults of the conducted experiments. Finally, Sec. 6analyzes the related work and Sec. 7 concludes the pa-per.

2. ONLINE BEHAVIOURAL ADVERTISINGOnline Behavioural targeted Advertising (OBA) is

the practice in online advertising wherein informationabout the interests of web users is incorporated in tai-loring ads. This information is usually collected overtime by aggregators or ad-networks while users browsethe web. This information can include the publishers/webpages a user browses as well as information on activ-ity on each page (time spent, clicks, interactions, etc.).Based on the overall activity of the users, profiles can bebuilt and these profiles can be used to increase the ef-fectiveness of ads, leading to higher click-through ratesand in turn, higher revenues for the publisher, the ag-gregator and eventually the advertiser by making a sale.We note that such targeting is referred to as networkbased targeting in the advertising literature.

In Figure 1, we provide a very high-level overviewof how OBA can happen, and information gleaned by

2

Page 3: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

……

Aggregator 2Aggregator 1

(3)

(1) (1) (2) (2)

Time Pages

Figure 1: High level description of how OBA can hap-

pen: User browses multiple webpages over time, each

page has ads and aggregators present on them. When the

user is on the current page (1), the aggregators present

on that page can either be new or aggregators that were

present on previous pages (2). Hence Aggregator 2 can

leverage on past information to show a tailored ad, while

Aggregator 1 can either show a run-of-network (RoN) ad

or get information from Aggregator 2 (3) to show a tai-

lored ad.

browsing can be used. Assume user Alice has no pri-vacy protection mechanisms enabled in her browser. Asshe is visiting multiple publishers (e.g., websites), heractivity is being tracked by multiple aggregators thatare present on each publisher, using any of the avail-able methods for tracking users [15, 22]. When Alicevisits a publisher, aggregators (aggregator 2) presenton that publisher could have already tracked her acrossthe web and based on what information they have abouther, they can target her accordingly. Another scenariocan be when an aggregator (aggregator 1) is presenton the current publisher where Alice is but was notpresent on previous publishers. In this case, the aggre-gator can either show a run-of-network ad (un-tailored)or obtain information about Alice from other aggrega-tors and/or data sellers to show tailored ads. Indeedthe full ecosystem consisting of aggregators, data sell-ers, ad-optimizers, ad-agencies etc. is notoriously com-plex2 [37], however for the purposes of this work, werepresent all entities handling data other than the userand the publishers, either collecting or selling data, asaggregators.

Other types of (less privacy intrusive) targeted ad-vertising techniques include: (i) Geographical TargetedAds are shown to a user based on its geographical loca-tion; (ii) Demographic Targeted Ads are shown to usersbased on their demographic profile (age, sex, etc) thatis estimated by aggregators in different manners, for in-stance, through the user’s browsing history [11]; (iii)

2http://www.displayadtech.com/the_display_advertising_technology_landscape#/the-display-landscape

Re-targeting Ads present to the user recently visitedwebsites, e.g., a user, that has checked a hotel in web-site A, receives an ad of that hotel when visiting websiteB few hours latter. Finally, a user can be exposed toContextual Ads when visiting a website. These ads arerelated to the theme of the visited website rather thanthe user’s profile and thus we consider them as non-targeted ads.

3. METHODOLOGY TO MEASURE OBAIn this section we describe our methodology to unveil

the presence (or absence) of OBA advertising as wellas to estimate its frequency and intensity compared tomore traditional forms of online advertising.

3.1 Rationale and ChallengesOur goal is to uncover causal links between users ex-

hibiting a certain behavioural trait and the display adsshown to them. Notice that we do not claim or at-tempt to reverse engineer the complex series of onlineauctions taking place in real time. We merely try todetect whether there is any correlation between the ad-vertisements displayed to a user and his past browsingbehaviour.

We create artificial personas that present a very nar-row web browsing behaviour that corresponds to a veryspecific interest (or theme), e.g., ‘motor sports’ or ‘cook-ing & recipes’. We train each persona by visiting care-fully selected websites that match its interest and bydoing so invite data aggregators and trackers to clas-sify our persona accordingly. We refer to the visitedwebsites as training webpages. For instance, the trai-ning set for the ‘motor sports’ persona would be formedby specific motor sports webpages. Therefore, two firstchallenges for our methodology are which personas toexamine and how to select training webpages for themthat lead to a minimal profile contamination [11]. Bycontamination, we are referring to the association oftags and labels not related to the main theme of thepersona.

Once the personas and the training webpages havebeen properly selected, we need to retrieve the ads thatthese personas obtain upon visiting carefully selectedcontrol webpages that meet the following criteria: (i)include a sufficient number of display ads, and (ii) havea neutral or very well defined context that makes it easyto detect context based advertisements and filter themout to keep only those that could be due to OBA. Weuse weather related webpages for this purpose.

The ads shown to a persona in the control pages leadto websites that we refer to as landing webpages. There-fore if the theme of the landing webpages for a per-sona has a large overlap with the theme of its trainingwebsites we can conclude that this persona frequentlyreceives OBA ads. To automate and scale the estima-tion of the topical overlap between training and landing

3

Page 4: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

pages, we rely on online tagging services (e.g., GoogleAdWords, Cyren, etc) that categorize webpages basedon keywords. We use them to tag each training andlanding webpage associated to a persona and computethe existing overlapping. Note that we decided to useseveral online tagging services or sources to remove thedependency on a single advertising platform (a limita-tion of previous works like in [11, 28]).

As indicated before OBA can co-exist with severalother types of advertisement on the same page, includ-ing: re-targeting ads, contextual ads, geographicallytargeted ads. Our goal is to define a flexible metho-dology able to detect and measure OBA in the pres-ence of such ads. Thus, the fourth challenge that ourmethodology faces is to define filters for detecting andremoving these other types of ads.

The final challenge for our methodology is to definemeaningful, simple, and easy to understand and mea-sure metrics to quantify OBA using keywords from thetraining and landing pages of a persona.

3.2 Details of the Methodology- Selection of Personas & Training Pages: Theselection of personas with a very specific behaviouraltrait, and thus a reduced profile contamination, is a keyaspect of our methodology. To achieve this in an auto-mated and systematic manner we leverage the GoogleAd Words’ hierarchical category system, which includesmore than 2000 categories that correspond to specificpersonas (i.e., behavioural traits) used by Google andits partners to offer OBA ads. For each one of thesepersonas, Ad Words provides a list of related websites(between 150 and 330 webs). We use these websites astraining pages for the corresponding persona. Specifi-cally, we consider the 240 personas of level 2 from theGoogle Ad Words’ category system and apply the fol-lowing three-steps filtering process to collect keywordsfor each persona while catering to avoid profile contam-ination:- Step 1: For a given persona p, we collect the keywordsassigned by Ad Words to every one of its related web-sites and keep in our dataset only those websites thathave p’s category among their keywords. For instance,for p = ‘motor sports’, we only keep those related web-sites categorized by Ad Words with the keyword ‘motorsports’. After applying this step 202 personas remainin our dataset.- Step 2: For each persona p, we visit each websiteselected during Step 1, using a clean, in terms of con-figuration, browser, i.e., without cookies, previous web-browsing history, or ads preferences profile. Then, wecheck the categories from the Google Ad Words sys-tem added to the ads preference profile3 of our browser

3Google’s ads preferences profile represents thebehavioural trait inferred by Google for a browser,

after visiting those websites. We only keep those re-lated websites, which add to the ads preference profileexclusively p’s category or p’s category plus a secondrelated one. For instance, for p = ‘motor sports’, weonly keep a related website in our dataset if it includesto the ads preferences profile the category ‘motor sport’or the category ‘motor sports’ plus a second one suchas ‘cars’ or ‘motorbikes’. After applying this step 104personas remain in our dataset.- Step 3: Our final dataset consists of 51 personas thatare left with at least 10 training pages each after steps 1and 2. Note that by having at least 10 training pages weintend to capture enough diversity in the visited sitesand expose our personas to a number of trackers thatwould approximate what a real user would find. Indeed,the considered personas are exposed to 15-40 trackerswhereas the examination of the browsers of 5 volunteersrevealed that they were exposed to 18-27 trackers4.

In addition to the above systematically collected per-sonas, we have also selected manually 21 sensitive per-sonas related to topics that, for instance, privacy regu-lation in Europe do not allow to track or process (e.g.,health, ethnicity, sexuality, religion or politics) [7]. In-terestingly the categories of our sensitive personas donot appear in the public Google Ad Words’ HierarchicalCategory System, however when querying for them inAd Words we obtain a similar output as for any otherpersona. We apply the same steps as above with a singledifference in Step 2, where we keep only websites thatdo not add any category in the ads preference profile ofour browser. By doing so, we ensure that our sensitivepersonas are not being associated with any additionalbehavioural trait.

The final list of 51 regular and 21 sensitive personascan be checked in Figure 3 and Figure 4, respectively.

- Selection of Control Pages: As indicated in themethodology’s rationale we need a set of pages thatare popular, have ads shown on them and yet have lownumber of easily identifiable tags associated with themand thus do not contaminate the profile of our personas.We used five popular weather pages5 as control pagessince they fulfil all previous requirements.- Visiting Training and Control Pages to obtainads: Once we have selected the set of training and con-trol pages for a persona, we visit them with the follow-ing strategy (see Sec. 4.1 for details). We start with afresh install, and select randomly a page from the poolof training+control pages to visit with the interval be-

based on the previously visited sites. It includes one ormore categories from the Ad Words category system.

4We have used the tracker detection tool provided by theEDAA [3] to obtain these results.

5http://www.accuweather.com,http://www.localconditions.com, http://www.wunderground.com,http://www.myforecast.com, http://www.weatherbase.com

4

Page 5: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

tween different page visits drawn from an exponentialdistribution with mean 3 mins6. By doing so, on theone hand, we regularly visit the training pages so thatwe allow trackers and aggregators present in those pagesto classify our persona with a very specific interest ac-cording to our deliberately narrow browsing behaviour.On the other hand, the regular visits to control pagesallow us to collect the ads shown to our persona tolatter study whether they are driven by OBA. An al-ternative strategy would be to visit first the trainingpages several times to get our persona profiled by ag-gregators and visit only control pages. We avoided thisstrategy because visiting consecutively multiple weathersites fooled the data aggregators into believing that ourbrowser was a “weather” persona.

- Tagging Training and Landing Pages: In orderto be able to detect systematically correlations betweentraining and landing pages we need to first identifythe keywords that best describe each webpage in ourdataset. For this purpose, we use 3 different sources:Cyren[2], Google Ad Words[5] and McAfee[9]. Eachsource has its own labeling system: Google Ad Wordslabels web-pages using a hierarchical category systemwith up to 10 levels and tag categories with 1 to 8keywords. Cyren and McAfee provide a flat taggingsystem consisting of 60-100 categories and label web-pages with at most 3 keywords. Note that by utilis-ing multiple sources we try to increase the robustnessof our methodology and limit as much as possible itsdependency to the idiosyncrasies of particular labelingsystems. Finally, it is worth noting that the coverageof the considered tagging services is very high for ourset of training and landing pages. In particular Google,McAfee and Cyren were able to tag 100%, 99.0% and95.5% of the training pages and 100%, 97.2% and 93.3%of the landing pages, respectively.

- Training Set Keywords: To achieve the aforemen-tioned robustness against the particularities of indivi-dual classification systems, we filter the keywords as-signed to a page by keeping only those that are assignedto the page by more than one of our 3 sources. The ideais to quantify OBA based on keywords that several ofour sources agree upon for a specific page relevant tothe trained persona. Assume we have a training web-page W tagged with the set of keywords K1 to K3 foreach one of the 3 sources above. Our goal is to selecta keyword k within Ki (i ∈ [1, 3]) only if it accuratelydefines W for our purpose. To do this, we leverage theLeacock-Chodorow similarity [25] (S(k, l)) to capturehow similar two word senses (keywords) k ∈ Ki andl ∈ Kj (j ∈ [1, 3] & j 6= i) are. Note that two keywordsare considered similar if their Leacock-Chodorow simi-

6This distribution is selected to emulate a human-beinggenerated inter-arrival time between visits according to re-cent measurement studies [23].

larity is higher than a given configurable threshold, T ,that ranges between 0 (any two keywords would be con-sidered similar) and 3.62 (only two identical keywords-exact match- would be considered similar). We com-pute the similarity of k belonging to a given source withall the training keywords belonging to other sources andconsider k an accurate keyword only if it presents aS(k, l) > T with keywords of at least N other sources.Note that N is also a configurable parameter that allowsus to define a more or less strict condition to considera given training keyword in a given source.

- Filtering different types of ads: To complete ourmethodology we describe next the filters used in orderto identify and progressively remove landing pages as-sociated with non-OBA ads:- Retargeting Ads Filter (Fr): In our experiment a re-targeting ad in a control page should point to either atraining or a control page previously visited by the per-sona. Since, we keep record of the previous webpagesvisited by a persona, identifying and removing retarget-ing ads from our landing set is trivial.- Static and Contextual Ads Filter (Fs&c): We have cre-ated a profile that after visiting each webpage removesall cookies and potential tracking information such thateach visit to a website emulates the visit of a user withempty past browsing history. We refer to this personaas clean profile. By definition, when visiting a controlwebpage the clean profile cannot receive any type oftargeted behavioural ad and thus all ads shown to thisprofile correspond to either static ads (ads pushed by anadvertiser into the website) or contextual ads (ads re-lated to the theme of the webpage). Hence, to eliminatea majority of the landing pages derived from static andcontextual ads for a persona, we remove all the com-mon landing pages between this persona and the cleanprofile.- Demographic and Geographical Targeted Ads Filter(Fd&g): We launch the experiments for all our personasfrom the same /24 IP prefix and therefore it is likelythat several of them receive the same ad when geogra-phical targeting is used. Moreover, we have computedthe Leacock-Chodorow similarity between the categoriesof each pair of personas in our dataset to determinehow close their interests are. To filter demographic andgeographical ads we proceed as follows: for a personap that has received an ad A, we select the set of otherpersonas receiving this same ad (O(A) = [p1,A, p2,A, ...])and compute the Leacock-Chodorow similarity betweenp and pi,A ∈ O(A). If the similarity between p and atleast one of these personas is lower than a given thres-hold T ′, we consider that the ad has been shown topersonas with a significantly different behavioural traitand thus it cannot be the result of OBA. Instead, itis likely due to geographical or demographic targetingpractices.

5

Page 6: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

- Measuring the presence and representativenessof OBA: We measure the volume of OBA for a givenpersona p by computing the overlapping between thekeywords of the training and landing pages for p. Notethat we consider that a training keyword and a landingkeyword overlap if they are an exact match. In parti-cular, we use two complementary metrics that measuredifferent aspects of the overlapping between the key-words of training and landing pages. However, let usfirst introduce some definitions used in our metrics: (i)We define the set of unique keywords associated withthe training pages for a persona p on source s as KTps ;(ii) We define the set of unique keywords associatedwith the landing pages of ads shown to a persona p onsource s on control pages as KLps

; (iii) Finally we de-fine the set of unique keywords associated to a singlewebpage W on source s as KWs . Note that the setof keywords associated to a web-page remains constantfor a given source regardless the persona. Using thesedefinitions we define our metrics as follows:

Targeted Training Keywords (TTK): This metric com-putes the fraction of keywords from the training pagesthat have been targeted and thus appear in the set oflanding pages for a persona p and a source s. It is for-mally expressed as follows:

TTK(p, s) =|KTps

∧KLps|

|KTps |∈ [0, 1] (1)

In essence, TTK measures whether p is exposed toOBA or not. In particular, a high value of TTK indi-cates that most of the keywords defining the behaviou-ral trait of p (i.e., training keywords) have been targetedduring the experiment.

Behavioural Advertising in Landing Pages (BAiLP): Thismetric captures the fraction of ads whose landing pagesare tagged with at least one keyword from the set oftraining pages for a persona p and a source s. In otherwords, it represents the fraction of received ads by pthat are likely associated to OBA. BAiLP is formallyexpressed as follows:

BAiLP (p, s) =

∑Lps

i=1 f(KW is) · ntimes

Lps∈ [0, 1]

where f(KW is) =

{1 if (KTps ∧KW i

s) ≥ 1

0 if (KTps ∧KW is) = 0

(2)

Note that ntimes represents the number of times anad has been shown to p and Lps is defined as the set oflanding pages for a persona p and source s.

In summary, TTK measures if OBA is happeningand how intensely (what percentage of the training key-words are targeted) whereas BAiLP captures what per-centage of the overall advertising a persona receives isdue to OBA (under different filters).

4. AUTOMATED SYSTEM TO MEASUREOBA

In this section we describe the development and eval-uation of a system that we developed for implement-ing our previously described methodology for measuringOBA.

4.1 System implementation and setupA primary design objective of our measurement sys-

tem was to be fully automated, without a need for man-in-the-loop, in any of its steps. The reason for thisis that we wanted to be able to check arbitrary num-bers of personas and websites, instead of just a hand-ful. Towards this end, we used a lightweight, headlessbrowser PhantomJS ver. 1.9 (http://phantomjs.org/)as our base as we can automate collection and handlingof content as well as configure different user-agents. Wewrote a wrapper around PhantomJS that we call Phan-tomCurl that handles the logic related to collection andpre-processing of data. Our control server was setup inMadrid, Spain. The experiments were run from Spainand United States. In the case of US we used a transpa-rent proxy with sufficient bandwidth capacity to for-ward all our requests. We used a user-agent7 corre-sponding to Chrome ver. 26, Windows 7. Our defaultsetup has no privacy protections enabled for personas,but for the clean profile, we enable privacy protectionand delete cookies after visiting each web-site. A sec-ond configuration set-up enables the Do Not Track8 [6]for all our personas. For each persona configuration(no-DNT and DNT) and location (ES and US) we run,in parallel, our system 4 times in slots of 8-hours in awindow of 3 days so that all personas are exposed tothe same status of the advertising market. These timeslots generate 310 visits per persona (on average) to thecontrol pages that based on the the results in [11] suf-fices to obtain the majority of distinct ads received bya persona in the considered period of time. To processthe data associated to each persona, configuration andgeographical location we use 3 sources to tag the trai-ning and landing pages (Google, McAfee and Cyren), 3different combinations of filters (Fr; Fr and Fs&c; Fr,Fs&c and Fd&g) and 2 metrics (TTK and BAiLP). Fur-thermore, we use different values of T and N (for theselection of training keywords) and T’ (for filtering outdemographic and geographic targeted ads). Overall ouranalysis covers more than 2.9K points in the spectrumof interest definitions, metrics, sources, filters, geogra-phical locations, privacy configurations, etc.

7We have repeated some experiments using differentuser-agents without noticing major differences in the ob-tained results.

8Do Not Track is a technology and policy proposal thatenables users to opt out of tracking by websites they do notvisit (e.g., analytics services, ad networks, etc).

6

Page 7: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

Recall Accuracy FPR FNRMcAfee 99.1/95.6% 99.0/94.2% 25.5/3.8% 4.4/0.1%Google 99.1/95.7% 99.0/94.3% 25.7/3.8% 4.3/0.1%Cyren 98.7/95.5% 98.6/94.1% 25.4/4,27% 4.5/1,3%

Table 1: Max and Min values of Recall, Accuracy,

FPR and FNR of our automated methodology to iden-

tify OBA ads for our three sources (McAfee, Google and

Cyren) for the analyzed personas. Max and Min values

correspond to ‘Bycicles Accesories’ and ‘Yard & Patio’

personas, respectively.

Before discussing the obtained results (Sec. 5), in thenext subsection we evaluate the performance of our sys-tem to identify OBA ads using standard metrics such asaccuracy, false positive ratio, false negative ratio, etc.It is worth mentioning that, to the best of our know-ledge, previous measurement works in the detection ofOBA [11, 28] do not perform a similar evaluation oftheir proposed methodologies.

4.2 System Performance to identify OBA adsTo validate our system we need to generate a ground

truth dataset to compare against. We used humans fora subjective validation of correlation between trainingand landing pages as done also by previous works [12,26, 27, 38]. To that end, two independent panelistssubjectively classified each one of the landing pages as-sociated to few9 randomly selected personas as OBA ornon-OBA. Note that the classification of these two pan-elists was different in 6-12% of the cases for the differentpersonas. For these few cases a third person performedthe subjective classification to break the tie.

For each ad, we compare the classification done byour tool as OBA vs. non-OBA with the ground truthand compute widely adopted metrics used to evaluatethe performance of detection systems: Recall (or HitRatio), Accuracy, False Positive Rate (FPR) and FalseNegative Rate (FNR). Table 1 shows the max and minvalue of these metrics across the analyzed personas forour three sources (McAfee, Google and Cyren). We ob-serve that, in general, our system is able to reliably iden-tify OBA ads for all sources. Indeed, it shows Accuracyand Recall values over 94% as well as a FNR smallerthan 4.5%. Finally, the FPR stay lower than 10% forall the analyzed personas excepting for the ‘Yard & Pa-tio’ persona where the FPR increases up to 25%.

5. MEASURING OBAIn this section we present the results obtained with

our measurement system for the purpose of answeringthe following essential questions regarding OBA (i) How

9Note that the manual classification process required ourpanelists to carefully evaluate around 300-400 landing pagesper persona. Then, it was infeasible to perform it for everypersona.

Training Pages

http://poolpricer.com http://levelgroundpool.comhttp://whirlpool-zu-hause.de http://poolforum.sehttp://eauplaisir.com http://photopiscine.nethttp://a-pool.czm http://allas.fihttp://seaglasspools.com http://piscineinfoservice.com

Table 2: Sample of the training webpages for ‘Swim-

ming Pools & Spas’ persona

Training Keywords Filtered TrainingKeywords

Gems & Jewellery —Gyms & Health Clubs —Outdoor Toys & Play Equipment Outdoor Toys & Play EquipmentSecurity Products & Services Security Products & ServicesSurf & Swim Surf & SwimSwimming Pools & Spas Swimming Pools & Spas

Table 3: Keywords associated to the training websites

for ‘Swimming Pools & Spas’ using Google as source be-

fore and after applying the semantic overlapping filtering

with N = 2 and T = 2.5

frequently is OBA used in online advertising?; (ii) DoesOBA target users differently based on their profiles?;(iii) Is OBA applied to sensitive topics?; (iv) Is OBAmore pronounced in certain geographic regions com-pared with others?; (v) Does Do-Not-Track have anyimpact on OBA?

We will start by analysing a concrete example andtry to help the reader follow along the different stepsof our methodology. After that we will present holisticresults from a large set of experiemtns.

5.1 Specific case: Swimming Pools & Spas andGoogle

Let us consider a persona, ‘Swimming Pools & Spas’,and a source, ‘Google’, to present the results obtained ineach step of our methodology for this specific case. Ta-ble 2 shows the set of training webpages for the ‘Swim-ming Pools & Spas’ persona. One can observe by thename of the webpages their direct relation to the ‘Swim-ming Pools & Spas’ persona.

We train this persona as described in Section 3.2.Then, in the post-processing phase we tag the trainingand landing webpages using our 3 sources. To describethe process in this subsection we refer to the results ob-tained for ‘Google’. Table 3 shows the 6 keywords thatGoogle assigns to the training websites in the first col-umn. However, this initial set of keywords may presentsome contamination including keywords unrelated tothe ‘Swimming Pools & Spas’ persona. Hence, we com-pute the semantic similarity between these keywordsand the keywords assigned by other sources to the trai-ning webpages with N = 2 and T = 2.5. This techniqueeliminates 2 keywords and leaves a final set of 4 trainingkeywords shown in the second column of Table 3.

Let us now focus on the landing webpages. Our ex-periments provide a total of 381 unique landing web-

7

Page 8: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

0.25 0.5 0.75 1 Animal Products & Services

0.25

0.5

0.75

1

Email & Messaging

0.25

0.5

0.75

1

File Sharing & Hosting

0.25

0.5

0.75

1

Banking

0.25

0.5

0.75

1Air Travel

0.250.50.751Cruises & Charters

0.25

0.5

0.75

1

HVAC & Climate Control

0.25

0.5

0.75

1

Antiques & Collectibles

0.25

0.5

0.75

1

Food

0.25

0.5

0.75

1Chemistry

0.25 0.5 0.75 1 Animal Products & Services

0.25

0.5

0.75

1

Email & Messaging

0.25

0.5

0.75

1

File Sharing & Hosting

0.25

0.5

0.75

1

Banking

0.25

0.5

0.75

1Air Travel

0.250.50.751Cruises & Charters

0.25

0.5

0.75

1

HVAC & Climate Control

0.25

0.5

0.75

1

Antiques & Collectibles

0.25

0.5

0.75

1

Food

0.25

0.5

0.75

1Chemistry

CYREN

GOOGLE

MCAFEE

(a) TTK

0.24 0.49 0.73 0.97Animal Products & Services

0.13

0.26

0.4

0.53

Email & Messaging

0.12

0.24

0.36

0.48

File Sharing & Hosting

0.23

0.47

0.7

0.94

Banking

0.047

0.095

0.14

0.19Air Travel

0.130.270.40.54Cruises & Charters

0.14

0.29

0.43

0.58

HVAC & Climate Control

0.087

0.17

0.26

0.35

Antiques & Collectibles

0.037

0.074

0.11

0.15

Food

0.16

0.31

0.47

0.63Chemistry

0.24 0.49 0.73 0.97Animal Products & Services

0.13

0.26

0.4

0.53

Email & Messaging

0.12

0.24

0.36

0.48

File Sharing & Hosting

0.23

0.47

0.7

0.94

Banking

0.047

0.095

0.14

0.19Air Travel

0.130.270.40.54Cruises & Charters

0.14

0.29

0.43

0.58

HVAC & Climate Control

0.087

0.17

0.26

0.35

Antiques & Collectibles

0.037

0.074

0.11

0.15

Food

0.16

0.31

0.47

0.63Chemistry

CYREN

GOOGLE

MCAFEE

(b) BAiLP

Figure 2: TTK and BAiLP for 10 personas and all sources for N = 2, T = T’ = 2.5 and all filters activated (Fr, Fs&c,

Fd&g)

TTK BAiLPFr 1 0.17Fs&c 1 0.75Fd&g 1 0.97

Table 4: TTK and BAiLP values after applying each

filter for the ‘Swimming Pools & Spas’ persona and

‘Google’ source.

Landing Webpages Num. adswww.abrisud.co.uk 1195www.endlesspools.com 106www.samsclub.com 16www.paradisepoolsms.com 8www.habitissimo.es 8www.abrisud.es 6ww.atrium-kobylisy.cz 6www.piscines-caron.com 5athomerecreation.net 4www.saunahouse.cz 4

Table 5: Top 10 list of landing webpages and the num-

ber of times their associated ads were shown to our

‘Swimming Pools & Spas’ persona.

pages after filter Fr. Then, we pass each of these web-pages for Fs&c and Fd&g filters sequentially. Each fil-ter eliminates 226 and 128 of the initial landing pages,respectively. This indicates that contextual ads (elimi-nated by Fs&c) are the more frequent type of ads. Afterapplying each filter we compute the value of the two de-fined metrics (TTK and BAiLP) using the resultant setof landing pages and its associated keywords and showthem in Table 4. The results suggest a high presenceof OBA ads. Indeed, the obtained TTK values indicatethat 100% of the training keywords are targeted andthus they appear among the landing keywords. More-over, the BAiLP shows that, depending on the specificapplied filter, between 17 and 97% of received ads byour ‘Swimming Pools & Spas’ persona are associated to

landing pages tagged with keyword from the training setand thus are likely to be associated to OBA advertising.Note that in the extreme case where no filters are ap-plied, BAiLP represents the percentage of all ads shownthat are suspected to be targeted (17% for ‘SwimmingPools & Spas’ persona). In the other extreme, whenall filters are applied, BAiLP shows the same percent-age after having removed all advertisements that can beattributed to one of the known categories described inSec. 2 (97% for ‘Swimming Pools & Spas’ persona).

Finally, Table 5 shows the Top 10 landing pages as-sociated to a larger number of ads shown during ourexperiment. We observe that the three most frequentlanding pages, that amount to most of the ads shown toour persona, are related to Swimming Pools, pointingclearly to OBA.

5.2 How frequent is OBA?Let us start analyzing the results obtained with our

methodology for each independent source. For this pur-pose, Figure 2 presents the values of TTK and BAiLPfor 10 selected personas in a radar chart. In particular,these results correspond to experiments run from Spain,with DNT disabled and all filters (Fr, Fs&c, Fd&g) ac-tivated. First, TTK shows its maximum value (i.e.,1) in 9 of the studied personas for Google and Cyrenand in 8 personas for McAfee. Moreover, TTK is notlower than 0.5 in any case. This result indicates thatregardless of the source used to tag websites, typicallyall the training keywords of a persona are targeted andthen appear in its set of landing keywords. Second, weobserve a much higher heterogeneity for BAiLP acrossthe different sources. In particular, Cyren seems toconsistently offer a high value of BAiLP in comparisonwith the other sources whereas McAfee offers the high-

8

Page 9: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

Lang

uage

Res

ourc

esAi

r Tra

vel

Swim

min

g Po

ols

& Sp

asBi

cycl

es &

Acc

esso

ries

Adve

rtis

ing

& M

arke

ting

Swap

Mee

ts &

Out

door

Mar

kets

Acco

untin

g &

Audi

ting

Soci

al N

etw

orks

Cam

pers

& R

VsBa

nkin

gVe

hicl

e Sh

oppi

ngTr

ansp

orta

tion

& Lo

gist

ics

Vehi

cle

Bran

dsVe

hicl

e M

aint

enan

ceYa

rd &

Pat

ioHV

AC &

Clim

ate

Cont

rol

Auct

ions

Visu

al A

rt &

Des

ign

Anim

al P

rodu

cts

& Se

rvic

esPr

ogra

mm

ing

Crui

ses

& Ch

arte

rsHa

ir Ca

reEm

ail &

Mes

sagi

ngGa

rden

ing

& La

ndsc

apin

gM

ovie

sM

otor

Spo

rts

Geog

raph

ic R

efer

ence

Mot

orcy

cles

Pest

Con

trol

Vehi

cle

Part

s &

Acce

ssor

ies

Phys

ics

Fun

& Tr

ivia

Astr

onom

yFi

le S

harin

g &

Host

ing

Body

Art

Spor

ts N

ews

Radi

o Co

ntro

l & M

odel

ing

Antiq

ues

& Co

llect

ible

sBo

ats

& W

ater

craf

tPo

etry

Mili

tary

Chem

istr

yFo

odOf

fbea

tPe

rson

al A

ircra

ftDa

ting

& Pe

rson

als

Off-R

oad

Vehi

cles

EBoo

ksCo

okin

g &

Reci

pes

Cele

briti

es &

Ent

erta

inm

ent N

ews

Foru

m &

Cha

t Pro

vide

rs

0.0

0.2

0.4

0.6

0.8

1.0

TTK

and

BAiL

P

TTK BAiLP

Figure 3: Average and standard deviation of TTK and BAiLP for each regular persona in our dataset sorted from

higher to lower average BAiLP.

est (lowest) BAiLP for 5 (3) of the considered personasand shows a remarkable agreement (BAiLP difference< 0.05) with Cyren in half of the considered personas.Google is the most restrictive source offering the low-est BAiLP for 6 personas. In addition it only showsclose agreement with Cyren and McAfee for two per-sonas (‘Air Travel’ and ‘Banking’). These results aredue to the higher granularity offered by Google com-pared to Cyren and McAfee that makes more difficultfinding matches between training a landing keywordsfor that source. If we now compare the BAiLP acrossthe selected personas, we observe that for 27 of the 30considered cases BAiLP ranges between 0.10 and 0.94regardless of the source. This indicates that 10-94%of the received ads by these personas are associated tolanding pages tagged with training keywords and then,they are likely to be the result of OBA.

These preliminary results suggest an important pres-ence of OBA in online advertising. In order to con-firm this observation and understand how representa-tive OBA is, we have computed the values of our twometrics, TTK and BAiLP, for every combination of per-sona, source, set of active filters and setting N = 2, T= T’ = 2.5 in our dataset. Again these experimentsare run from Spain and with DNT disabled. In total 4runs of 459 independent experiments were conducted.Figure 3 shows the average and standard deviation val-ues of TTK and BAiLP for the 51 considered personas,sorted from higher to lower average BAiLP value. Ourresults confirm a high presence of OBA ads. The ob-tained average TTK values indicate that for 88% ofour personas all training keywords are targeted and ap-pear among the landing keywords (for the other 12% atleast 66% of training keywords match their correspon-dent landing keywords). This high overlapping shows

unequivocally the existence of OBA. However to moreaccurately quantify its representativeness we rely on ourBAiLP metric, which demonstrates that half of our per-sonas are exposed (on average) to 26-63% of ads linkedto landing pages tagged with keywords from the personatraining set. Since the overlap is consistently high, in-dependently of the source, filters used, etc., we concludethat these ads are likely the result of OBA.

5.3 Are some personas more targeted than oth-ers?

Figure 3 shows a clear variability in the representa-tiveness of OBA for different personas. Indeed, thedistribution of the average BAiLP values across ourpersonas presents a median value equal to 0.23 withan interquartile range of 0.25 and a max/min value of0.63/0.02. This observation invites the following ques-tion: “Why are some personas targeted more intenselythan others?”. Our hypothesis is that the level of OBAreceived by a persona depends on its economic value forthe online advertising market. To validate this hypoth-esis we leverage the AdWords keyword planner tool10,which enables us to obtain the suggested Cost per Click(CPC) bids for each of our personas.11 The bid value isa good indication of the relative economic value of eachpersona. Then, we compute the spearman and pear-son correlation between the BAiLP and the suggestedCPC for each persona in our dataset. Note that toproperly compute the correlation, we eliminate outliersamples based on the suggested CPC.12 The obtained

10https://adwords.google.com/KeywordPlanner11Specifically, we use the keyword defining the interest of

each persona to obtain its suggested CPC bid.12We use a standard outlier detection mechanism, which

considers a sample as an outlier if it is higher (smaller) than

9

Page 10: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

Expa

tria

te C

omm

uniti

es

Canc

er

Juda

ism

Paga

n &

Esot

eric

Tra

ditio

ns

Polit

ical

Pol

ls &

Sur

veys

AIDS

& H

IV

Alzh

eim

ers

Dise

ase

Infe

ctio

us D

isea

ses

Disc

rimin

atio

n &

Iden

tity

Rela

tions

Drug

& A

lcoh

ol T

estin

g

Gene

tic D

isor

ders

Relig

ion

& Be

lief

Gay-

Lesb

ian-

Bise

xual

-Tra

nsge

nder

Budd

hism

Hind

uism

Assi

sted

Liv

ing

& Lo

ng T

erm

Car

e

Skin

Con

ditio

ns

Chris

tiani

ty

Sexu

ally

Tra

nsm

itted

Dis

ease

s

Left-

Win

g Po

litic

s

Skep

tics

& No

n-Be

lieve

rs

0.0

0.2

0.4

0.6

0.8

1.0

TTK

and

BAiL

P TTK BAiLP

Figure 4: Average and standard deviation of TTK and BAiLP for each sensitive persona in our dataset sorted from

higher to lower average BAiLP.

spearman and pearson correlations are 0.44 and 0.40(with p-values of 0.004 and 0.007), respectively. Theseresults validate our hypothesis since we can observe amarked correlation between the level of received OBA(BAiLP) and the value of the persona for the onlineadvertising market (suggested CPC bid).

5.4 Is OBA applied to sensitive topics?The sensitive personas in our dataset present behaviou-

ral traits associated to sensitive topics including health,religion, and politics. Tracking these topics is illegal(at least) in Europe. To check if this is being respectedby the online advertising market, we repeat the exper-iment described in the previous subsection for all our21 sensitive personas, setting the geographical locationin Spain. In this case, we run 4 repetitions of 189 inde-pendent experiments.

Figure 4 shows the average and standard deviationvalues of TTK and BAiLP for each sensitive persona,sorted again from higher to lower average BAiLP value.One would expect to find values of TTK and BAiLPclose to zero indicating that sensitive personas are notsubjected to OBA. Instead, our results reveal that de-spite the lower values compared to the personas of Figure3, the median value of average TKK is 0.47 indicat-ing that for half of the sensitive personas at least 47%of the keywords defining their behavioural trait remaintargeted. Moreover, BAiLP results show that 10-40%of the ads received by half of our sensitive personasare associated to OBA. In summary, we have providedsolid evidence that sensitive topics are tracked and used

Q3+1.5*IQR (Q1-1.5*IQR) being Q1, Q3 and IQR the firstquartile, the third quartile and the interquartile range, re-spectively.

for online behavioural targeting despite the existence ofregulation against such practices.

5.5 Geographical bias of OBAIn order to search for possible geographical bias of

OBA, we have run the 459 independent experimentsdescribed in Subsection 5.2 using a transparent proxyconfigured in US so that visited websites see our mea-surement traffic coming from a US IP address. For eachpersona, we have computed the average BAiLP acrossall combinations of sources and filters for the experi-ments run in Spain vs. US and calculated the BAiLPdifference. Figure 5 shows the distribution of averageBAiLP difference for the 51 considered personas in theform of a boxplot. Note that a positive (negative) dif-ference indicates a major presence of OBA ads in Spain(US). The BAiLP differences are restricted to less than10 percentage points across all cases with an insignif-icant bias (median of BAiLP difference = 2.5%) to-wards a major presence of OBA ads in Spain than inUS. Hence, we conclude that there is not a remarkablegeographical bias in the application of OBA. Note thatwe have repeated the experiment with our other metric,TTK, obtaining similar conclusions.

5.6 Impact of Do-Not-Track in OBAFollowing the same procedure as in the previous sub-

section, we have computed the average BAiLP differ-ence when DNT is activated from when DNT is notfor each one of the 51 regular personas in our dataset,fixing in both cases the geographical location to Spain.Figure 5 depicts the distribution of the average BAiLPdifference where, a positive (negative) difference indi-cates a major presence of OBA ads with the DNT acti-vated (deactivated). The median of the distribution is

10

Page 11: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

Spain vs. US DNT vs. Non−DNT−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Avg. B

AiL

P d

iffe

rence

Figure 5: Distribution of the average BAiLP difference

for the 51 regular personas in our dataset for the cases:

Spain vs. US (left) and DNT vs. non-DNT (right)

∼0, indicating that half of the personas attract moreOBA ads either with DNT activated or not. More-over, the IQR reveals that half of the personas presenta relatively small BAiLP difference (≤ 8 percentagepoints). Therefore, the results provide strong evidencesthat DNT is barely enforced in Internet and thus its im-pact in OBA is negligible. Again, we have repeated thisexperiment with TTK obtaining similar conclusions.

6. RELATED WORKOur work is related to recent literature in the areas of

measurement driven studies on targeting/personalizationin online services such as search [17, 29] and e-commerce[30, 31, 18]. More specifically, in the context of ad-vertising the seminal work by Guha et al. [16] presentsthe challenges of measuring targeted advertising, in-cluding high levels of noise due to ad-churn, network ef-fects like load-balancing, and timing effects. Our metho-dology considers these challenges. Another early workby Lorolova et al. [21] presents results using microtar-geting to expose privacy leakage on Facebook.

Other recent studies have focused on economics ofdisplay advertising [15], characterizing mobile advertising [35]and helping users to get control of their personal dataand traffic in mobile networks [33], designing large scaletargeting platforms [13] or investigating the effective-ness of behavioural targeting [36]. Moreover, Lecuyeret al. [26] developed a service-agnostic tool to establishcorrelation between input data (e.g., users actions) andresulting personalized output (e.g., ads). The solutionis based on the application of the differential correlationprinciple on the input and output of several shadow ac-counts that generate a differentially distinct set of in-puts.

Our work is different in focus to this previous liter-ature since we are primarily concerned with OBA indisplay advertising, with the intention of understand-ing the collection and use of sensitive personal informa-tion at a large scale. To the best of the authors know-ledge, only a couple of previous works analyze the pres-

ence of OBA advertising using a measurement drivenmethodology. Liu et al. [28] study behavioural adver-tisement using complex end-user profiles with hundredsof interests (instead of personas with specific interests)generated from an AOL dataset including users onlinesearch history. The extracted profiles from passive mea-surements are rather complex (capturing multiple inte-rests and types), and are thus, rather inappropriatefor establishing causality between specific end-user inte-rests and the observed ads. Our approach is activerather than passive, and thus allows us to derive an ex-act profile of the interest that we want to capture. Fur-thermore, the authors collapse all types of targeted ad-vertising (demographic, geographic and OBA), except-ing re-targeting, whereas we focus on OBA due to itshigher sensitivity from a privacy perspective. Barfordet al. [11] present a large-scale characterisation study ofthe advertisement landscape. As part of this study theauthors look at different aspects such as the new adsarrival rate, the popularity of advertisers, the impor-tance of websites or the distribution of the number ofads and advertisers per website. The authors examineOBA very briefly. They trained personas but as theyacknowledge their created profiles present a significantcontamination including unrelated interests to the per-sona. Our methodology carefully addresses this issue.Moreover, these previous works check only a small pointof the entire spectrum of definitions, metrics, sources,filters, etc. For instance, they rely on Google ads ser-vices to build their methodologies which reduces thegenerality of their results. Our work has taken a muchbroader look on OBA including both the methodology,the results, and the derived conclusions. Finally, to thebest of the authors knowledge, ours is the first workreporting results about the performance of the usedmethodology, the extent to which OBA is used in dif-ferent geographical regions and the utilization of DNTacross the web.

7. CONCLUSIONSThis paper presents a methodology to identify and

quantify the presence of OBA in online advertising. Wehave implemented the methodology into a scalable sys-tem and run experiments covering a large part of the en-tire spectrum of definitions, metrics, sources, filters, etcthat allows us to derive conclusions whose generality isguaranteed. In particular, our results reveal that OBAis a technique commonly used in online advertising.Moreover, our analysis using more than 50 trained per-sonas suggests that the volume of OBA ads receivedby a user varies depending on the economical value as-sociated to the behaviour/interests of the user. Moreimportantly, our experiments reveal that the online ad-vertising market targets behavioural traits associated tosensitive topics (health, politics or sexuality) despite the

11

Page 12: I Always Feel Like Somebody’s Watching Me …I Always Feel Like Somebody’s Watching Me Measuring Online Behavioural Advertising Juan Miguel Carrascosa Univ. Carlos III de Madrid

existing legislation against it, for instance, in Europe.Finally, our analysis indicates that there is no signifi-cant geographical bias in the application of OBA andthat do-not-track seems to not be enforced by publishersand aggregators and thus it does not affect OBA. Theseessential findings pave a solid ground to continue theresearch in this area and improve our still vague know-ledge on the intrinsic aspects of the online advertisingecosystem.

8. REFERENCES[1] Adblock Plus. https://adblockplus.org/.[2] Cyren URL Category Check.

http://www.cyren.com/url-category-check.html.[3] Digital Advertising Alliance Consumer Choice Page.

http://www.aboutads.info/choices/.[4] Disconnect — Online Privacy & Security.

https://disconnect.me/.[5] Display Planner - Google AdWords. https:

//adwords.google.com/da/DisplayPlanner/Home.[6] Do not track. http://donottrack.us/.[7] European Union Directive 95/46/EC.

http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:en:HTML.

[8] Ghostery. https://www.ghostery.com/.[9] McAfee.

https://www.trustedsource.org/?p=mcafee.[10] Privacy Choice. http://www.privacychoice.org/.[11] Paul Barford, Igor Canadi, Darja Krushevskaja, Qiang

Ma, and S Muthukrishnan. Adscape: harvesting andanalyzing online display ads. In Proc. WWW, 2014.

[12] J. Carrascosa, R. Gonzalez, R. Cuevas, andA. Azcorra. Are Trending Topics Useful forMarketing? Visibility of Trending Topics vsTraditional Advertisement. In COSN, 2013.

[13] Ye Chen, Dmitry Pavlov, and John F Canny.Large-scale behavioral targeting. In Proc. ACMSIGKDD, 2009.

[14] Ayman Farahat and Tarun Bhatia. Retargetingrelated techniques and offerings, April 25 2011. USPatent App. 13/093,498.

[15] Phillipa Gill, Vijay Erramilli, Augustin Chaintreau,Balachander Krishnamurthy, KonstantinaPapagiannaki, and Pablo Rodriguez. Follow themoney: understanding economics of online aggregationand advertising. In Proc. ACM IMC, 2013.

[16] Saikat Guha, Bin Cheng, and Paul Francis. Challengesin measuring online advertising systems. In Proc.ACM IMC, 2010.

[17] Aniko Hannak, Piotr Sapiezynski, Arash MolaviKakhki, Balachander Krishnamurthy, David Lazer,Alan Mislove, and Christo Wilson. MeasuringPersonalization of Web Search. In Proc. WWW, 2013.

[18] Aniko Hannak, Gary Soeller, David Lazer, AlanMislove, and Christo Wilson. Measuring PriceDiscrimination and Steering on E-commerce WebSites. In Proc. ACM/USENIX IMC, 2014.

[19] G. Hardin. The tragedy of the commons. Science,1968.

[20] Miguel Helft and Tanzina Vega. Retargeting ads followsurfers to other sites. The New York Times, 2010.

[21] Aleksandra Korolova. Privacy violations usingmicrotargeted ads: A case study. In Proc. IEEEICDMW, 2010.

[22] Balachander Krishnamurthy and Craig E. Wills.Privacy diffusion on the web: a longitudinalperspective. In Proc. WWW, 2009.

[23] Ravi Kumar and Andrew Tomkins. A Characterizationof Online Browsing Behavior. In Proc. WWW, 2010.

[24] Anja Lambrecht and Catherine Tucker. When doesretargeting work? information specificity in onlineadvertising. Journal of Marketing Research, 2013.

[25] Claudia Leacock, George A Miller, and MartinChodorow. Using corpus statistics and WordNetrelations for sense identification. ComputationalLinguistics, 1998.

[26] Mathias Lecuyer, Guillaume Ducoffe, Francis Lan,Andrei Papancea, Theofilos Petsios, Riley Spahn,Augustin Chaintreau, and Roxana Geambasu. XRay:Enhancing the Web’s Transparency with DifferentialCorrelation. In USENIX Security, 2014.

[27] Kathy Lee, Diana Palsetia, Ramanathan Narayanan,Md Mostofa Ali Patwary, Ankit Agrawal, and AlokChoudhary. Twitter trending topic classification. InICDMW, 2011.

[28] Bin Liu, Anmol Sheth, Udi Weinsberg, JaideepChandrashekar, and Ramesh Govindan. Adreveal:improving transparency into online targetedadvertising. In Proc. ACM HotNets, 2013.

[29] Anirban Majumder and Nisheeth Shrivastava. KnowYour Personalization: Learning Topic LevelPersonalization in Online Services. In Proc. WWW,2013.

[30] J. Mikians, L. Gyarmati, V. Erramilli, andN. Laoutaris. Detecting price and searchdiscrimination on the Internet. In Proc. ACMHotNets, 2012.

[31] J. Mikians, L. Gyarmati, V. Erramilli, andN. Laoutaris. Crowd-assisted search for pricediscrimination in e-commerce: First results. In Proc.ACM CoNEXT, 2013.

[32] A Odlyzko. The end of privacy and the seeds ofcapitalism’s destruction. In Privacy Law Scholars’Conference.

[33] Ashwin Rao, Justine Sherry, Arnaud Legout, ArvindKrishnamurthy, Walid Dabbous, and David Choffnes.Meddle: middleboxes for increased transparency andcontrol of mobile traffic. In Proc. ACM CoNEXTstudent workshop, 2012.

[34] Latanya Sweeney. Discrimination in online ad delivery.Queue, 2013.

[35] Narseo Vallina-Rodriguez, Jay Shah, AlessandroFinamore, Yan Grunenberger, KonstantinaPapagiannaki, Hamed Haddadi, and Jon Crowcroft.Breaking for commercials: characterizing mobileadvertising. In Proc. ACM IMC, 2012.

[36] Jun Yan, Ning Liu, Gang Wang, Wen Zhang, YunJiang, and Zheng Chen. How much can behavioraltargeting help online advertising? In Proc. WWW,2009.

[37] Shuai Yuan, Ahmad Zainal Abidin, Marc Sloan, andJun Wang. Internet Advertising: An Interplay amongAdvertisers, Online Publishers, Ad Exchanges andWeb Users. CoRR, 2012.

[38] Arkaitz Zubiaga, Damiano Spina, Vıctor Fresno, andRaquel Martınez. Classifying trending topics: atypology of conversation triggers on twitter. In CIKM,2011.

12