automated bid adjustments in search engine advertising1119130/fulltext01.pdf · keywords: digital...

INOM EXAMENSARBETE DATALOGI OCH DATATEKNIK,AVANCERAD NIVÅ, 30 HP

, STOCKHOLM SVERIGE 2017

Automated Bid Adjustments in Search Engine Advertising

MAZEN ALY

KTHSKOLAN FÖR INFORMATIONS- OCH KOMMUNIKATIONSTEKNIK

Automated Bid Adjustments in

Search Engine Advertising

Student name: Mazen Aly

Master Degree ProjectRoyal Institute of Technology (KTH)

Sweden2017

Acknowledgements

I would like to acknowledge the help and support that I got from my examiner prof. MagnusBoman and my academic supervisor prof. Sarunas Girdzijauskas. Many thanks to my supervisorat Precis digital, Carl Regardh, for his guidance and support throughout this project. Thankyou Precis Digital and especially the data science team (Joao Coelho, Marie Ericsson, PatrikBerggren and Pierre Rudolfsson) for the continuous support and for our discussions that helpedshape this thesis. Special thanks go to my wife, Rewan, who is always there to support me.This would not be possible without her.

2

Abstract

In digital advertising, major search engines allow advertisers to set bid adjustments on their adcampaigns in order to capture the valuation differences that are a function of query dimensions.In this thesis, a model that uses bid adjustments is developed in order to increase the numberof conversions and decrease the cost per conversion. A statistical model is used to select cam-paigns and dimensions that need bid adjustments along with several techniques to determinetheir values since they can be between -90% and 900%. In addition, an evaluation procedureis developed that uses campaign historical data in order to evaluate the calculation methodsas well as to validate different approaches. We study the problem of interactions between dif-ferent adjustments and a solution is formulated. Real-time experiments showed that our bidadjustments model improved the performance of online advertising campaigns with statisticalsignificance. It increased the number of conversions by 9%, and decreased the cost per conver-sion by 10%.

Keywords: Digital Advertising; Bid Adjustments; Optimization; Statistical Analysis; A/BTesting.

Sammanfattning

I digital marknadsforing tillater de dominerande sokmotorerna en annonsor att andra sina budmed hjalp av sa kallade budjusteringar baserat pa olika dimensioner i sokforfragan, i syfteatt kompensera for olika varden de dimensionerna medfor. I det har arbetet tas en modellfram for att satta budjusteringar i syfte att oka mangden konverteringar och samtidigt min-ska kostnaden per konvertering. En statistisk modell anvands for att valja kampanjer ochdimensioner som behover justeringar och flera olika tekniker for att bestamma justeringensstorlek, som kan spanna fran -90% till 900%, undersoks. Utover detta tas en evalueringsme-tod fram som anvander en kampanjs historiska data for att utvardera de olika metoderna ochvalidera olika tillvagagangssatt. Vi studerar interaktionsproblemet mellan olika dimensionersbudjusteringar och en losning formuleras. Realtidsexperiment visar att var modell for bud-justeringar forbattrade prestandan i marknadsforingskampanjerna med statistisk signifikans.Konverteringarna okade med 9% och kostnaden per konvertering minskade med 10%.

Nyckelord: Digital Marknadsforing; Budjusteringar; Optimering; Statistisk analys; A/B-testning.

3

Contents

1 Introduction 71.1 Research Problem and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Aim and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.1 Data Collection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.2 Data Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.3 Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Background and Literature Review 122.0.1 Early Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.0.2 Advertisers Metric of Success . . . . . . . . . . . . . . . . . . . . . . . . . 122.0.3 Performance-Based Advertising . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Search Engine Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.1 Google Adwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.2 Search Advertising Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.3 Ad Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Advertiser Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Bid Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Campaigns and Dimensions Selection Model 183.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Metric of Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Diminishing Returns Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.5 Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5.1 Assumptions Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.5.2 Post-hoc test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.6 Chi-Squared Test for Independence . . . . . . . . . . . . . . . . . . . . . . . . . . 233.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.6.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.6.3 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Adjustments Calculations and Evaluation 254.1 Marginal ICPA Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Constrained Linear Regression Method . . . . . . . . . . . . . . . . . . . . . . . 264.3 Average of Slopes Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Total ICPA Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Cost Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.6.1 Traditional Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . 294.6.2 Proposed Evaluation Method . . . . . . . . . . . . . . . . . . . . . . . . . 30

4

4.7 Adjustments Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.7.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.7.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.7.3 Minimization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Real-time Experiments 355.1 A/B testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Adwords Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.3.1 Binomial Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3.2 A Priori Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3.3 Experiments Campaigns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Results 386.1 Aggregate Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.2 In-depth Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Discussion 427.1 Statistical Significance Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 427.2 Adjustments Calculations Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 447.3 Risks and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.3.1 Cross-device Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 457.3.2 Broad vs Exact Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . 457.3.3 Conversions Time Lag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8 Conclusions 478.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

References 49

Appendices 52

5

Acronyms

CPA Cost Per Acquisition.CPC Cost Per Click.CPM Cost Per Mille.CR Conversion Rate.CTR Click-Through Rate.GSP Generalized Second-Price auction.ICPA Inverse of Cost Per Acquisition.SEM Search Engine Marketing.SEO Search Engine Optimization.SERP Search Engine Results Page.ROAS Return On Ad Spend.

Glossary

Campaign:In our context, a campaign is just an organization of ads and keywords, as opposed to being awhole marketing campaign that lasts for a short duration of time.

Conversion:A conversion is the action that the advertisers want the user to make after clicking on an ad.Normally, it is a purchase of a product or service, but it can also be other actions like sign ups,phone calls, downloading a brochure, application installation, etc. Some sources use the wordacquisition to mean the same thing. In this thesis, we use the two words interchangeably.

Impression:Impressions are defined as the number of times an ad is shown. An impression is counted whenan ad is displayed on the search engine results of a user.

Keyword:Keywords are words and phrases that are chosen by the advertisers to describe a product orservice. They help determine when and where the ads appear in the search engine results.

Organic Search Results:Organic search results are shown as a result of just being relevant to a query entered in a searchengine, and they do not include any paid ads.

6

Chapter 1

Introduction

Search Engine Marketing (SEM) allows advertisers to take advantage of millions of searchesconducted on search engines each day by driving interested people to their websites. Advertiserscreate campaigns and ads of their businesses, and they pay to search engines in order for theiradvertisements to be shown beside the organic search results. Search Engine Optimization(SEO) is different from SEM in the sense that website owners do not pay to make their websitesappear in the search engine results page (SERP). It is mainly the process of changing a webpage and follow best practices in order to be trusted by search engines and improve its visibilityin organic results. This chapter intends to provide the reader with an overview of the fieldof search advertising as well as our research problem, motivation, methodology, outline andlimitations of this thesis project.

1.1 Research Problem and Motivation

In search advertising, advertisers bid on certain keywords in order for their clickable ads toappear in the search engine results. One of the most important challenges that advertisersface in SEM is determining the bid values for each keyword in an advertising campaign. Thisproblem is interesting in the advertising industry as it affects the advertisers marketing costsand profits. Bids calculations affect the marketing costs as a bid is the maximum cost that theadvertiser is willing to pay when an ad get clicked. Bidding on the right keywords with theright values can get a profitable click as the user can make a conversion.

If we bid on a keyword that has a high probability of conversion with a too small bid, thenour advertisement can be shown on a low position or not be shown at all, and thus we maylose a conversion and decrease our profits. On the other hand, high bids on keywords withpoor performance can exhaust our marketing budgets without achieving our marketing goals.As shown in chapter 2, calculating the bids values is a challenging problem and researcherstackle it from different angles using various sources of information as there are many aspectsthat affect these calculations, like market changes, competitors behavior, seasonality, keywordrelevance to the search query, keyword-match type, etc.

Our work builds on previous work in the sense that it uses more dimensions of the searchquery like user device and day of the week not to calculate the the bids, but rather to adjustthe already calculated bids. These dimensions can have a significant effect on the value of thead to the bidder, as well as the market price of the ad placement.

In 2013, major search engines started to allow an advertiser to set bid adjustments or modifierson their ad campaign in order to account for differences in valuation that are a function of thesetypes of dimensions [30]. The transition to this mode of bidding has been characterized as oneof the most important recent changes to search engines [31]. Advertisers are allowed to submitadjustments along with their bids, and the adjustments can be made on features of the searchquery including time of day, day of the week, location, device type. They are made to allow us

7

show the ads more or less frequently based on where, when, and how people search and that iswhy we believe that using bid adjustments on the already calculated bids can yield better results.

1.2 Aim and Scope

When working on digital advertising, many goals can be set, like creating brand awareness,increasing the revenues or increasing the online traffic to the advertiser. The aim of this thesisis to provide a scalable method for selecting adwords campaigns that need bid adjustments inorder to increase conversions and decrease the cost per conversion.

There are several dimensions that can be used in bid adjustments like device, location, dayof the week and time in the day. Although the techniques that are developed in this thesiscan be used for all dimensions, the focus of the thesis is on the device and day of the weekdimensions, as if we incorporate all dimensions, it will be problematic to test all dimensions inreal-time experiments. In other words, if the experiment does not yield the intended results,it would be difficult to know what is the cause of the results. It is recommend in A/B testingand online experiments to make only one change at a time to understand the results [6]. Inaddition, the experiment duration takes more than one month, and for this project, there is notenough time for multiple experiments. At the same time, our model addresses the challengesof interactions between multiple dimensions. As a result, we have to work with more than onedimension to test and validate the interactions part.

1.3 Contributions

Several contributions are presented in this thesis. First we provide a statistical model to selectcampaigns and dimensions groups that need bid adjustments. Second, we propose severaltechniques for determining the values of bid adjustments and we compare between them. Third,we develop an evaluation framework that uses historical data of the campaigns in order toevaluate different technique as well as validate the adjustments calculations. Fourth, we discussthe problem of multiplicative bid adjustments and we present a solution for it. Finally, wedesign real-time experiments to evaluate our bid adjustments and we discuss the results.

1.4 Environment

This master degree project is carried out during an internship within the data science team atPrecis Digital which is a data-driven digital marketing company in Stockholm. Precis Digitalwas founded in 2012 and it takes a data-driven approach to maximize the outcome of its clients’digital marketing investments. Precis Digital is the winner of the Best Large PPC Agency awardin the European Search Awards 2017 [7].

In this project, Google AdWords (owned by Google, the giant search engine.) is the onlineadvertising platform that is used, because it is the main search advertising platform at PrecisDigital, and thus, it is the source of the used data is this project. However our work can beused for bid adjustments in other major search engines like Bing.

1.5 Methodology

Although there are lots of good and rigorous literature about methods and methodologies inacademic research [2, 3, 4], we follow the framework of research methods and methodologiespresented in Fig 1.1. This framework helps in selecting and applying the best suited methodsthat belong together as well as avoiding picking methods that do not match. It contains the

8

methodologies that are commonly used in information technology [18].

Figure 1.1: The portal of research methods and methodologies. Adapted from [18].

When selecting methods for our research, every layer in the portal, starting from the top, isinvestigated before entering the next layer and towards the bottom. As recommended in [18],we select and apply at least one method on this project, before moving to the next layer. Thebasic categories of research methods are either quantitative methods and qualitative methods.These two methods are considered to be polar opposites [9], and they are applied on projectsthat are either numerical or non-numerical [18]. This project is of a quantitative research natureas modeling, experiments and testing are done by measuring metrics to verify or falsify theoriesand hypotheses which are measurable and quantifiable.

We base our work on large data sets and statistics is used to test hypotheses and evaluatethe results. Although the project is mainly of quantitative nature, we use a method calledtriangulation and we borrow some methods that are actually qualitative like exploratory dataanalysis in order to get a complete view of the research area, and to ensure correctness, credi-bility and validity of the results [18].

In this project we follow the positivism paradigm [10] which assumes that the reality is ob-jectively given and independent of the observer and instruments, to be concrete, the models arebased on real-world historical data of search advertising campaigns. This assumption works inprojects that are of experimental and testing character. It dismisses or proves a phenomenonby drawing inferences from the sample to the population, quantifying measures of metrics, andtesting hypotheses. The positivist assumption works well for testing performances within in-formation technology.

In this project, we use a hybrid of two methods namely descriptive research & applied re-search. Descriptive research method, also called statistical research, studies phenomenon anddescribes its characteristics but not its causes. It can use either quantitative or qualitativemethods [2]. We focus on finding facts in already existing data, of the effect of several di-mensions like user device on advertising campaigns performances. It can be used for all kindsof research or investigations in computer science genre that aim to describe phenomenon orcharacteristics [18].

The second method is applied research that involves answering specific questions or solvingknown and practical problems. The method examines a set of circumstances and it often buildson existing research. In addition, applied research uses data directly from the real work and ap-

9

plies it to solve problems and develop practical applications and that is the goal of this project.Applied research is used for all kinds of research or investigations, which is often based on basicresearch and with a particular application in mind.

In this project, deductive approach [5, 11] is used to verify or falsify hypotheses, and it isalmost always, used with quantitative methods with large data sets. Hypotheses are expressedin measurable terms, explaining what and how the metrics are to be measured. The outcomeis a generalization that is based on the collected data, along with explanations of the results.

The research strategies and designs are the guidelines for carrying out the research [18]. We usethe ex post facto research which is similar to experimental research [18] but does not control orchange the independent variable since it is carried out after the data is already collected whichis the case in this project. Ex post facto = after the fact, means that it searches back in timeto find plausible causal factors. The method also verifies or falsifies hypotheses and providescause-and-effect relationships between variables [12].

1.5.1 Data Collection Methods

In this project, data collection is straightforward, as we use the historical data of many Adwordscampaigns during the internship at Precis Digital. The task is automated using Adwords API[20] which allows us to interact directly with the platform, vastly increasing the efficiency ofanalysing large AdWords accounts and campaigns. In addition, we use real time data forrunning and evaluating online experiments.

1.5.2 Data Analysis Methods

The data analysis methods are used to analyze the collected material. It is the process ofinspecting, cleaning, transforming and modelling data. It supports decision-making and drawingconclusions [18]. In this project, the following methods are used for data analysis:

Statistics: Both descriptive and inferential statistics to analyze the collected data, infer in-formation and evaluate the significance of the results.

Mathematics: Used for calculating numerical methods, modelling and optimizations.

Visualizations: For deep and better understanding of the characteristics of the collecteddata.

1.5.3 Quality Assurance

Quality assurance is the validation and verification of the research material. Since we have aquantitative research, with a deductive approach, we must apply and discuss validity, reliabilityand ethics [13].

Validity: In quantitative research, we must make sure that the test instruments actually aremeasuring what is expected to be measured [3, 13]. In this project, we assume that the datacollected from Adwords are measured correctly.

Reliability: It refers to stability of the measurements [3] and the consistency of the resultsfor every testing. and to ensure that, we use statistical tests significance.

Ethics: Throughout the work on this project, we maintain the privacy of the clients of Precisdigital. The collected data is treated with confidentiality and presented in the thesis afteranonymization.

10

1.6 Thesis Outline

In chapter 2, we discuss the background of digital advertising, literature review and the contextof this project as well as how it relates to previous work. Chapter 3 presents the metric thatwe optimize for, as well as the model for selecting the campaigns and dimensions groups for bidadjustments. In chapter 4, we propose several techniques for adjustments calculations and wediscuss the problem of adjustments interactions. In addition we discuss an evaluation procedurethat is used in selecting, validating and evaluating different methods. Chapter 5 presents thedesign of online experiments to test our model on real data, and in chapter 6, we analyze theresults of the experiments. Chapter 7 is for the project discussion, and in chapter 8, we presentthe conclusions and the future work.

11

Chapter 2

Background and LiteratureReview

Online search is now ubiquitous and internet search engines such as Google and Bing let com-panies and individuals advertise based on search queries posed by users [29]. In this chapter wediscuss the context of our project by a literature study and by presenting an overview of therelated work and knowledge needed in order to build upon them. This chapter rests heavily on[26] and [35].

2.0.1 Early Models

Internet advertising started almost simultaneously with the inception of the Internet. The ad-vertisements that we used to see were only banner ads and these are graphical units that wesee on web pages. Popular websites charged a certain amount of money for every thousandimpressions of the ad, and this is called the CPM rate.1 This model of paying per impressionswas inspired from TV and magazine advertising which are priced based on the circulation of amagazine or the number of viewers of a TV show. That was a good model to start with, butit did not make use of many features that we can have on the web and are not available in TVand magazine advertising. For concreteness, these advertisements are untargeted, so the sameadvertisement is shown to everyone who comes to a website, and thus, they can be good inbranding or to create awareness but they perform poorly in targeting specific users who needthe advertiser’s product or service.

This model was shifted to demographics targeting that makes use of demographic informa-tion about the types of users who are likely to see a given web page. Although the model thatuses demographic data is better than the initial one, in general these advertisements still donot perform well because they are broadly targeted.

2.0.2 Advertisers Metric of Success

One way the advertisers measure the performance of their advertising efforts is by looking athow many users who viewed the ad, actually clicked on it, in other words, they look at theclick-through rate (CTR) which is the ratio between the number of clicks the ad receives andthe number of impressions of that ad. It is important to point out that the impressions arewhat advertisers pay for, and the clicks are what they want. As a results, they measure thereturn on investment by looking at the ratio of clicks to impressions, and the untargeted earlybanner ads had very low click-through rates and very low return on investment for advertisers.

1M in CPM stands for the Roman numeral for a thousand so CPM rate is the cost per thousand impressions.

12

2.0.3 Performance-Based Advertising

The model of online advertising changed with the development of a new form of advertisingcalled performance-based advertising which was introduced by a company called Overturewhich was a search engine that got acquired by Yahoo!. Overture innovation was allowingadvertisers to bid on search keywords and when the user searches for that keyword then the adof the highest bidder would be shown followed by the actual search results. Another importantinnovation that Overture introduced was charging advertisers only if the ad was clicked. Inother words, advertisers do not pay for the impression but they pay only for the click and sothis is called performance-based advertising or Cost per Click advertising to distinguish it fromthe impression-based or CPM advertising that preceded it.

There are many challenges and research around performance-based advertising that can fallinto two categories; the first category addresses the challenges from the perspective of thesearch engine, and the second set of challenges come from the advertiser perspective [29].

2.1 Search Engine Perspective

The research presented in this thesis primarily addresses advertising from the perspective ofadvertisers. However, it is also important to consider the search engine point of view since thework of advertisers stems from search engines. For this reason, the challenges faced in adver-tising are presented from a search engine point of view in this section.

Advertisers favour Overture’s model of paying only for clicks compared to paying for impres-sions. Google which is another search engine that was just getting started on roughly the sametime, adopted a very similar model to Overture’s around 2002 and they introduced their adver-tising platform Adwords. Google introduced some important changes to the Overture modelin terms of how advertisers bid and what ads get shown. Adwords receives a set of keywordsfrom each advertiser with their respective bids. It also receives a stream of search queries fromthe users, and the challenge is to select and show a few ads from many possible ads eligible tobe shown for the same search query. It is worth mentioning that the goal of a search engineis to maximize its revenues by showing the appropriate set of ads and it needs an online al-gorithm. That means, the search engine can only see one query at a time and it must makean irrevocable decision of deciding which ad to show. It cannot go back and change the ad-vertisements it showed in the past nor does it know what queries are going to come in the future.

Overture used a naive heuristic of sorting advertisers by bid and the ad of highest bid willbe in position one. It turns out that this is not the best way, because the ads behave verydifferently in terms of how often they get clicked. Therefore, placing the ad of the highestbidder in position one is not the optimal algorithm of maximizing the search engine revenues.The contribution of Google Adwords was introducing the use of the average CTR of each adin computing the expected revenue for each advertiser which is calculated by multiplying thebid and the ad’s CTR. In other words, it sorted advertisers by expected revenue rather thansorting them bids.

If the CTR of each ad is known and if advertisers have unlimited budgets then the simplealgorithm of sorting advertisers by expected revenue is actually optimal, however in practiceadvertisers don’t have unlimited budgets and the CTR of an ad is unknown. The balancealgorithm deals with the fact that advertisers have limited budgets, and for estimating theclick-through rate of an ad, one can think of a very simple solution which is showing the ad alarge number of times and calculate the CTR historically. Although this is the right way todo it but there are two challenges in this approach, the first is that the click-through rate isactually position-dependent as search engines may show more than one ad for a given query, soan ad that is shown in the first position generally gets more clicks than an ad that shown in thesecond position. Therefore, we have to measure the click-through rate of an ad for each position

13

and not just for one position. The second problem is addressing the explore vs exploit trade-offwhich is the dilemma of whether to keep showing ads that we already know their CTRs orshowing new ads that we still need to know their CTRs. In other words, should we just exploitthe known information of an ad or should we explore new ads that can have a better or worseCTR. This problem important and heavily studied [27].

It is worth noting that the current version of Adwords takes more parameters into consid-eration while sorting the advertisers like the relevance of keywords to the search query and thelanding page experience as a search engine has to show relevant and useful ads in order for theusers to keep using its services.

2.1.1 Google Adwords

Since we use Google Adwords in this project, it is very important to know how ad auctionswork in this online advertising platform, since there are many factors that affect the visibilityof each ad that participate in the auction as well as the paid cost per each click on the ad.Google Adwords allows the advertisers to show their ads on Google search results, the searchterms which are several combinations of words types by users on Google page when they searchfor products or services.

All search ads on Google have the main structure of a header, two lines of description, theword ad in green to make the users aware that this is an ad, and a link to a web page as shownin figure 2.1. There are additional and optional components that the advertiser may put likesite links, phone number, mobile application install link, etc.

Figure 2.1: An example of an ad. source: http://www.google.se, Search terms : precis digital,Date: May 30, 2017.

2.1.2 Search Advertising Terms

It is important throughout this thesis to know the most used definitions in order to understandthe topics discussed in this project [36].

• Cost Per Click (CPC): The cost that the advertiser pays when an ad is clicked.

• Maximum CPC: The maximum cost the advertiser is willing to pay when the ad is clicked.

• Average CPC: The average cost that advertiser pay per click. It is the ratio of the totalcosts to the total number of clicks.

• Cost: The total amount of money the advertiser has spent on clicks.

• Conversion Rate (CR): The ratio of conversion to the ad clicks.

• Cost Per Acquisition (CPA): The average cost that advertiser pay per conversion. It isthe ratio of the total costs to the total number of conversions.

• Inverse of Cost Per Acquisition (ICPA): It is the ratio of the total number of conversionsto the total costs.

• Return on Ad Spend (ROAS): The ratio of the total revenues to the total costs.

14

Working on increasing conversions entails optimizing for impressions and also for clicks. In theconversions funnel which consists of three phases, impressions, clicks and conversions. Eachphase of the funnel is narrower than the one before it. Consequently, the process of increasingthe number of conversions entails increasing the number of relevant impressions which increasethe number of relevant clicks, and that leads to more conversions.

2.1.3 Ad Auctions

In order to place an ad on a SERP, advertisers enter an auction that is carried out amongall advertisers who bid for ads with keywords that matches the user’s query. There are manytypes of auctions but the standard one which is heavily used in literature is the generalizedsecond-price auction (GSP) [25]. In GSP auctions, each advertiser pays a price equal to the bidvalue of the advertiser below them in the ranking.

Since we work with Adwords platform in this project, it is important to describe how Googleauction (which is a variant for GSP) works. The motivation of the auctions is to reconcile theinterests of three parties, namely the advertiser, the user and the search engine namely Google.Each party has a concern or motivation for participating in an auction. The advertiser wantsto show relevant ads of their products or services so that users click on them and possibly makea conversion. The users do not want to be bothered with spam or other irrelevant ads, andGoogle wants to generate revenues and make a good experience for both the advertisers andthe users so that they come back and use its services again in the future.

Each time a user makes a query on Google, the ads which are relevant to the search query(in terms of keywords similarity) participate in an ad auction. The auction determines whetherthe ad will be shown or not and in what position in the SERP will it be shown. The first stepin the auction is that Google ignores the ineligible ads, like the ones that target a differentlocation for the user or are disapproved. Then, Google calculates the ad rank for all the eligibleads, and the rank is calculated based on a combination of advertiser’s bid and ad quality score.Only those with a sufficiently high ad rank are shown in the SERP. The ad quality score iscalculated based on the relevance of the keywords to the query text, the quality of the landingpage, as well as the historic CTR of the ad. It is worth mentioning that an advertiser can getthe top position even if they bid less than their competitors by using highly relevant keywordsand quality ads. To improve the ad position, one can increase the bids for the ads and improvethe quality of the ads as well as the landing page experience [32]. However, the advertiser canhave fluctuations in the position that they get in different auctions for the same keyword, asthe competition scene can vary from auction to another.

Now we know how Google runs an ad auction and sorts the advertisers, but what is the actualprice that the advertiser pays when the ad is clicked? the answer is just enough to beat thecompetition. In other words, each advertiser can bid with the maximum amount that theyare willing to pay but the actual cost per click is determined by dividing the ad rank of thecompetitor below them by their own ad quality score plus a small amount as $.01 just to beatthe competition [33].

2.2 Advertiser Perspective

The challenge from an advertiser’s point of view is to understand and interact with the auctionmechanism. The advertiser determines a set of keywords of their interest and then create ads,set bids for each keyword, and provide a total daily budget.

One of the challenges that the advertiser faces is the choice of keywords and this problemis related to the domain-knowledge of the advertiser, user behavior and different strategic con-siderations. Search engines provide the advertisers with information of the query traffic whichcan be useful for for optimizing the keyword choices. The choice of keywords is addressed inother papers [24]. Another major challenge is determining the bids for each keyword. This

15

problem in heavily studied in papers [29] in which the authors propose uniform bidding as ameans for bid optimization in the presence of budget constraints in online ad auctions. Howeverto provide a context for our work, we present a method proposed in [34].

In order to increase the conversions, the relative impressions and clicks have to increase. Indeed,it is possible to increase the clicks and even get less conversions if for example the campaigns,ad groups and keywords are not structured correctly. However, the model assumes that theaccount is already optimized and getting more clicks would necessarily results on more conver-sions as the clicks are relevant. Assuming the ad quality score is constant, changing the bidvalues would affect the position of the ads and that affects the number of clicks received whichin turn affects the number of conversions. The model is formulated as a constrained integerprogramming problem. The constraints are the maximum CPC as well as the overall budget Bthe advertiser is willing to afford.

There are variables that need to be estimated before using the model which are the aver-age CTR and average CPC for every keyword (or keywords combinations) at every position.The model suggests to invest some money to experiments with the keywords in order to getthese values or using keywords performance prediction from Google Adwords.

Let us define a set K that contains m keywords, and a set P that contains n available po-sitions. Then ∀i, i = 1, ...,m and ∀j, j = 1, ..., n

Clicksij = CTRij ∗ Impressioni

Costij = Clicksij ∗ CPCij

We define xij as a decision variable which either has the value 0 or 1. It represents whether akeyword (or its combinations) will be assigned to a certain position or not.

Maximize:∑i∈K

∑j∈P

xij ∗ clicksij

Subjected to:∑i∈K

∑j∈P

xij ∗ costij ≤ B

Constraint :

∑i∈K

∑j∈P xij ∗ CPCij∑

i∈K∑

j∈P xij ∗ clicksij≤ maximum CPC

Constraint : ∀i, i = 1, ...,m∑j∈P

xij ≤ 1

xij ∈ {0, 1}

It is important to note that this algorithm does not bid uniformly on keywords as it uses thespecified maximum CPC specified by the advertiser, as a limit for bids on each keyword orcombination of keywords. In addition, the existing algorithms that bid on keywords do notmake use of different features about the search query that can help in determining the bidvalues in order to increase the conversions. In other words, these algorithms assume that aclick that comes from a Tablets on Tuesdays has the same value as the click that comes fromMobiles on Fridays. This is the limitation that we address in this thesis using bid adjustments.

16

2.3 Bid Adjustments

The focus of this thesis is adjusting the already calculated bids for the selected keywords usingthe unique set of dimensions of each individual search query like geographic location, time ofday, device, etc. Bid adjustments allow the advertisers to show their ads more or less frequentlybased on where, when, and how people search. For instance, a click can sometimes be worthmore if it comes from a mobile, at a certain time of day, and from a specific location. Theadvertiser can adjust their bids with percentages anywhere between -90% and 900% for each ofthese dimensions based on how the click is valued. The bidding dimensions supported by themajor search engines include week day, time of day, geographic location, device and keywordtargeting. In this project, we define the term “groups” as the possible values that a given“dimension” can take. For example the groups for the device dimension are desktops, mobilesand tablets, and the groups for the weekday dimension are the seven days of the week.

There are two main limitations in bid adjustments that we need to point out. First, bidadjustment are not set on the keywords level although the bid values are set on that level. Tobe concrete, we can not increase the bids of a specific keyword 10% for a certain location. In-stead, we can just increase the bids for the whole campaign that includes this keyword. Hence,in our work, we analyze the performance of the a campaign in order to calculate, evaluate andtest the bid adjustments. In addition, bid adjustments do not allow specifying valuation oncombinations of features. For example, if an advertiser found that mobile searches were 30%more valuable than desktop searches in Gothenburg, but only 15% more valuable in Stockholm,then this would not be expressible in the language of bid adjustments. Such limitations areinevitable, as the space of possible combinations of these features is very large. This problemof multiplicative bidding is addressed in the paper [28] and the authors formulated the problemas a Knapsack problem. In chapter 4 we discuss another formulation of the problem that hasa different setting. It is worth noting that the source of these bid adjustments limitations isGoogle Adwords.

17

Chapter 3

Campaigns and DimensionsSelection Model

In this chapter, we discuss our model for choosing the campaigns and dimensions groups forbid adjustments. The model relies on the law of diminishing returns and statistical modeling.

3.1 Motivation

There are two main motivations of selecting the campaigns and dimensions groups for bid ad-justments. The first motivation is that for many Adwords campaigns, we can see a differencebetween dimension groups performances that actually might be due to chance and not statis-tically significant. In other words, the bid adjustments would be calculated based on noisethat worsen the current performance, and may result in a loss in advertising investments. Toavoid that, we compare between the groups, and if there is a significant difference betweenthem we modify or adjust the already existing bids to account for these value differences. It isnot a surprise to see the recommendation in Adwords guides of using bid adjustments if onedimension is performing significantly better or worse than another dimensions [37]. The secondmotivation is that in many cases, there is a statistical difference between the dimensions groups.However if the groups that perform well are actually at their best performance, then biddingmore would not be helpful and actually can worsen the overall campaign performance as wewill see in section 3.3.

3.2 Metric of Optimization

Before getting into the details of how to select the campaigns and also calculate the bid ad-justments, we have to select a metric in order to compare between the dimensions groupsperformances. A metric can be, for example, CTR, CR, ROAS or CPA. In the setting of thisthesis, and as discussed in section 1.2, we want to significantly increase the conversions giventhe same costs so the metric of comparison should be a ratio between the value and the cost,and reason is we might get many conversions for instance on a specific day, however the costfor these conversions can also be high (for example due to competition). In this case, we mightdecrease the bids for this day, and increase them in another day which has cheap conversions.Although we want to increase conversions, it is even more desirable to increase the total con-version value given the same costs. Consequently, the two metrics of interest are either CPA orROAS. For some businesses, it does not differ to use CPA or ROAS since the conversions valueshave low variance. However, for other businesses, ROAS is trickier to use since the conversionvalue can vary a lot for each conversion which makes it more vulnerable to outliers. In thisthesis, we use CPA as it is more robust to outliers than ROAS.

One problem that can arise during mathematical calculations in case of having a campaign

18

that does not have any conversion is division by zero. To avoid that, we use the metric ICPAwhich is the number conversions divided by cost which we want to maximize it. Note that thenumber of conversions reported from Adwords is assumed to be clean and final because theactual number of conversions can be different from reported one because of several potentialsituations including but not limited to order returns, order changes via phone and cancellations.

3.3 Diminishing Returns Law

One of the fundamental principles in economics is the law of diminishing returns. In our settingdepicted in figure 3.1, It is the decrease in the marginal or incremental number of conversions asthe advertising costs are incrementally increased while other factors that affect the conversionsstay constant. There can be many reasons for the diminishing returns that depend on theindustry. In the case of search advertising, one reason can be the average position of our adsin the search engine results page is already 1 (top position), also we may get an impression forevery auction that we participate in (we can know that form a metric called impression share).Therefore, increasing the costs would not be useful, as there is nothing that we can gain withincreasing the costs.

Figure 3.1: Example of theoretical diminishing return curves for three devices. We can see thatthe incremental conversions decrease as we incrementally increase the cost.

In Google Adwords, we can know the average position of our ads in a specific campaign as wellas the average impression share lost due to rank. If the average position is less than 1.5 and thelost impression share is more than 20%, we assume that there is a room of improvement andbid adjustments would be useful. On the other hand, if the average position of a campaign isalready in top positions (less than 1.5) and the lost impression share due to rank is small (lessthan 20%) we avoid adjusting for these campaigns as there would be diminishing return.There is just a caveat here, which is the metric of average position and lost impression sharereturned from Adwords are not reliable or in other words, they are not calculated in the way wewant. For concreteness, let us say we have a campaigns that include three keywords of averagepositions 1, 1 and 3. In Adwords, the average position calculated for the campaign will be justthe average (2.5), regardless of the traffic of each keyword. As a results, if 99% of a campaigntraffic and conversions is captured by just one keyword of average position 1, in that case, thecampaign average position is still 2.5, although it should be almost 1.

To avoid that, we may calculate the actual average position of a campaign by weighting eachkeyword average position with the number of impression that a keyword has. However, we

19

believe it is better to weight the average position (and also the impression share lost due torank) using the number of conversions of each keyword as in equation 3.1 in order to capturethe case of having two keywords, one with more impressions with less conversions and anotherkeyword with less impressions and more conversions.

campaign avgposition =

∑k∈keywords positionk ∗ convk∑

k∈keywords convk(3.1)

Although in this project we are weighting the metrics by the keywords number of conversions,it is important to point out that if a campaign has small number of conversions, then weightingwith the conversions can be easily affected by outliers. In that case it is better to weight averageposition using a more robust metric like the number of clicks for each keyword.

3.4 Hypothesis Testing

Hypothesis testing is often used to determine whether the observed data are true in general,i.e., whether the data obtained from a sample can be generalized to the whole population. Letus say that for the devices dimension, we found that the ICPA of mobiles is better than theICPA of desktops and tablets. Hypothesis testing determines if this is true in general andthus we can adjust the bids based on this fact. In hypothesis testing, null hypothesis is theassumption that there is no significant difference between the groups of interest i.e, no differencebetween the ICPA of the three devices. We also have the alternative hypothesis which is havingsignificant difference. Rejecting or disproving the null hypothesis is done using statistical teststhat quantify the sense in which the null hypothesis can be proven false, given the data thatare used in the test.

Null hypothesis Accepted RejectedTrue Correct decision Type I errorFalse Type II error Correct decision

Table 3.1: Statistical errors related to the null hypothesis.

In Hypothesis testing, there are two types of errors that we can get, Type I errors which are nullhypothesis is falsely rejected giving a false positive and Type II errors which are null hypothesisfails to be rejected given the fact of actual difference between populations and thus giving afalse negative. All statistical hypothesis tests have a probability of making type I and type IIerrors. A test’s probability of making a type I error is denoted by α. A test’s probability ofmaking a type II error is denoted by β. These error types are always traded off against eachother, as the effort to reduce one type of error generally results in increasing the other type oferror. We have a number of choices related to the null hypothesis as in table 3.1. Obviously,the null hypothesis can be either true or false. Additionally, we can choose to accept or rejectthe null hypothesis. This results in four potential decisions, two of which are correct and twoof which are incorrect.

Condition Greek symbol Meaning Controlled usingType I error α False positive Significance levelType II error β False negative Statistical power

Table 3.2: Interpretation and control of statistical errors.

Indeed, we prefer not to commit either type I or type II errors, but it is important to pointout that the p value is directly related only to type I error, For concreteness, the p value issimply the probability (denoted α ) of committing type I error in a given test. As a result,when we state that the results are significant (p < 0.05), we are saying that we are potentiallycommitting type I error less than 5% of the time. However, in order to decrease the probabilityof committing type II, we must design our test with sufficient statistical power as in table 3.2.

20

3.5 Analysis of Variance (ANOVA)

For devices, we have the ICPA of three groups, which are Desktops, Mobiles, Tablets as shownin fig 3.2. Our goal is determine if the average daily ICPA of any group is significantly differentfrom another group, so that we can adjust the bids for these groups i.e increase the bids morefor the group with significantly higher ICPA and decrease them otherwise.

Figure 3.2: The daily device ICPA for one Adwords campaign.

One-way ANOVA is a technique that is useful in this case, as it is used to compare the meansof three or more groups. It can be used only for numerical data. It tests the null hypothesisthat states that the average ICPAs for each group are equal.

H0 : ICPAdesktop−avg = ICPAtablet−avg = ICPAmobile−avg

To do this, We calculate a test statistic called F-statistic which is the ratio between mean squareerror between the groups and the mean square error within the groups.

F =V ariance between groups

V ariance within groups

If the group means are drawn from populations with the same mean values, the variance be-tween the group means should be lower than the variance of the samples [38]. A higher ratiotherefore implies that the samples were drawn from populations with different mean values. Theresults of a one-way ANOVA can be considered reliable as long as the following assumptionsare met [39]:

• Independence of observations

• Normality of the residuals

• Homogeneity of variance

3.5.1 Assumptions Validation

Independence of observations: It is an assumption that simplifies the statistical analysis.In probability theory, two random variables are statistically independent if the occurrence of onedoes not affect the probability of occurrence of other. In our analysis, each observation is theICPA of a certain group in a certain day, and we assume that each observation is independentfrom the other.

21

Normality assumption: The second assumption is the normality of the residuals. The firstway to check for normality is using the Q−Q plots (”Q” stands for quantile). It is a graphicalmethod for comparing two probability distributions by plotting their quantiles against eachother. For example, if we have n data points and we want to determine if they can be as-sumed to be sampled from a certain distribution, we sort these points and plot them againstthe appropriate quantiles from the distribution of interest (Normal distribution in this case)[14]. If the the plotted points are along a line then we can assume that they sampled from thisdistribution as in figure 3.3

Figure 3.3: Q−Q plot of the normally distributed residuals. This visualization is one methodthat can be used in order to verify the normality of residuals. However, it is difficult to beautomated and it relies on the opinion of the viewer, i.e. it is a subjective method.

Although Q−Q plot is a powerful method to check for normality, it has two disadvantages, thefirst one is that it is hard to automate, meaning it is difficult to write an algorithm that examinethe plot and decide whether the data can be assumed to be normal. Moreover, the decision issubjective, since two analysts can examine the same plot and one decides that the data can beassumed to be normal and the other does not. To overcome these problems, one can rely onShapiro-Wilk test which is a test of normality [15]. Its null-hypothesis is that the population isnormally distributed. Thus, if the p-value is less than the chosen significance level, then the nullhypothesis is rejected and there is evidence that the data are not from a normally distributedpopulation. On the contrary, if the p-value is greater than α, then it is not rejected.

Homogeneity of variance assumption: The assumption states that all comparison groupshave the same residual variance. To test for homogeneity of variance, we can rely on Bartlett’stest, however, it is sensitive to normality departures [19].

Levene’s test is an alternative to Bartlett test that is less sensitive to non-normality. It as-sesses the null hypothesis of homogeneity of variance. If the resulting p-value of the test is lessthan some significance level, the obtained differences in sample variances are unlikely to haveoccurred based on random sampling from a population with equal variances. Thus, the nullhypothesis of equal variances is rejected and it is concluded that there is a difference betweenthe variances in the population.

22

3.5.2 Post-hoc test

ANOVA determines if there is a difference between the groups, but it does not detect the ex-act groups that are different from each others. That is why it is important to follow ANOVAwith a post-hoc test that does this task. There are two main ways to detect the groups withsignificance difference. The first way is multiple t-tests between the group combinations, andin this case our type I error will increase as every additional test has its own type I error whichwill aggregate on multiple testing. One solution for this, is to use Bonferroni correction, whichis using a significance level of original alpha divided by the number of tests [22].

Another way is to use the Tukey method which is a one step multiple comparison method.It finds means that are significantly different from each other. It compares all possible pairs ofmeans, and is based on a studentized range distribution [23]. In this project,we use the Tukeymethod because it is just one step and thus it faster in execution, especially in the case ofthe weekdays dimensions for which the Bonferroni correction method would need 21 individualtests since we have 21 weekdays combinations

(72

).

3.6 Chi-Squared Test for Independence

3.6.1 Motivation

Using ANOVA was to test the null hypothesis of having equal means of the groups daily ICPAvalues. However, working on the calculations of bid adjustments (discussed in the next chap-ter), we found that it is better to work using the total ICPA of each group to represent itsperformance instead of using the daily ICPA. As a result, ANOVA is not relevant anymore,because we do not have groups averages that we want to test if they are equal. Instead, we havetotal costs and total conversions for every group, and we want to test if the total ICPA valuesare dependent on the dimensions groups, and that is the reason for the shift to the Chi-squaredtest of independence. This section rests on [40].

There are two types of Chi square tests, namely the goodness-of-fit test, and the test of in-dependence. They are both closely related, but we are interested in the Chi-Squared test ofindependence, because we want to determine if the there are dimensions groups that affectthe ICPA ratio between costs and conversions. The null hypothesis of the Chi-squared test ofindependence in our case is that the dimension of interest and the ICPA are independent. Thealternative hypothesis is being not independent, in other words, there is an effect of a specificdimension on costs and conversions.

3.6.2 Assumptions

There are three main assumptions about the data that we have to meet in order to have re-liable results from the Chi-squared test. First, the sampling of the data should be random.This assumption is already met, since we are not sampling the data or in other words, we areusing all the conversions and costs during the time period of interest. Second, the variablesdata that we are testing should be counts or frequencies that are mutually exclusive and havea total probability equals to one. In our case, any conversion and its cost is only attributed toone dimension group. For example, we can not have a conversion that is counted for Mondaysand Thursdays at the same time. Third, the expected frequency for each variable is at least 5,and in this project we make sure that our data also meets this assumption.

3.6.3 Formulation

The first step to use the test is to build the contingency tables for both observed and expectedvalues, and we calculate the deviations between the expected and the observed values. Thesedeviations are scaled based on the expected values. The chi-square statistic which is calculated

23

as below is one measure of these deviations.

χ2 =∑

i∈cells

(observedi − expectedi)2

expectedi

The chi-square statistic is based on the Chi-squared distribution which is a non-negative andasymmetric distribution. It is skewed to the right, and has a family of distributions based onthe number of degrees of freedom as shown in figure 3.4. The statistic helps us answer thequestion of whether what we are observing is random or is it unlikely to be random. The Chi-squared table is then used to calculate the p-value using the statistic and degrees of freedomwhich is (in our case) the number of dimensions groups minus one. The p-value determines theprobability that this deviation is due to chance. If that probability is below the significancelevel, we deduce that it can’t be due to chance, and there must be an effect.

Figure 3.4: The theoretical Chi-squared distribution for different degrees of freedom. Figure isadapted from [40].

If we get a statistical significance of dependence, then we need to identify which groups con-tributed the most to this significance. To do that, we identify the cells with the largest residuals[17]. A residual is the difference between the observed and expected values for a cell. The largerthe residual, the greater the contribution of the cell to the magnitude of the resulting Chi-squareobtained value. As stated in [16], “a cell-by-cell comparison of observed and estimated expectedfrequencies helps us to better understand the nature of the evidence” and the cells with largeresiduals “show a greater discrepancy than we would expect if the variables were truly inde-pendent” (p. 38).

24

Chapter 4

Adjustments Calculations andEvaluation

This chapter intends to present the core model of calculating the adjustments. Different meth-ods are investigated and the reasoning behind each method is discussed. In addition, we proposean evaluation procedure to validate these methods and select the best one for a real-time ex-periment. We present our cost weighting technique for controlling the costs after setting theadjustments. Finally we discuss and propose a solution for the adjustments interactions prob-lem. It is worth mentioning that, for the sake of simplicity, the examples and visualizations arefor the devices. However, the concepts are applied to different dimensions.

4.1 Marginal ICPA Method

The marginal ICPA method relies on the law of the diminishing return phenomenon discussedin section 3.3. It increases the bid for a dimension group that has a marginal ICPA largerthan the dimension average. Conversely, it decreases a bid in the case of having a marginallower than the average. These adjustments intend to equalize the marginal ICPA for all thegroups. Mathematically, the marginal is the slope of the diminishing return curve at a certaincost. However, if we plot the daily conversions for a certain campaign against the daily costsas in figure 4.1, we hardly see the smooth theoretically diminishing return curve introducedbefore. To overcome this problem, we use linear regression to fit a line for each device costs andconversions daily observations, and we assume that the slope of this line can represent marginalICPA. The calculation of the bid adjustment of each group is then a straightforward task, as itis the ratio between marginal of a specific group and the average marginal of all groups.

bid adjustmentgroup =marginal ICPAgroup

marginal ICPAaverage

Note that the average here does not mean calculating the marginal of each device and then get-ting the average, but rather means that we fit a line using linear regression for the observationsof all devices and the slope of this line is the average ICPA.

25

Figure 4.1: Marginal ICPAs for the three devices. For the duration of four weeks, each pointin this graph represents the conversions and costs for a specific device in one day.

Using our evaluation procedure, we get poor results using the marginal ICPA method. Thereason for this, is the highly dynamic market of search advertising due to several factors likecompetition changes, daily traffic changes and seasonality, for example. It is important topoint out that the campaign performance is the aggregate performance of all the keywords inthe campaign, consequently, these factors have different effects on every keyword and thus weget unstable values of the marginal. That means, these marginal ICPA values are highly depen-dent on the time interval used for the calculations. For instance the bid adjustment calculatedfor desktops in the first two weeks of a month can yield a total different adjustment calculatedusing the last two weeks of the same month.

4.2 Constrained Linear Regression Method

Constrained linear regression method is based on the fact that with zero costs, the numberof conversions is zero. In the previous method we calculate the adjustments based on thedaily observations of costs and conversion. We discussed why these observations are not robustenough to base our adjustments on. As every observation can vary for the same cost and thesame device, for instance, we can pay 2000 SEK in one day and receives x amount of conversionsand in another day we can receive 2x (due to market changes). On contrast, for zero costs, weknow that the total conversion value is zero. As a result, we base our adjustments on fitting aline taking this fact into consideration and a constrained least-squares regression is performedin order to get a line that minimizes the error and passes through the origin as in figure 4.2.

26

Figure 4.2: Constrained regression through the origin for the three devices. For the durationof four weeks, each point in this graph represents the conversions and costs for a specific devicein one day.

The fitted line is not representing the marginal ICPA anymore, but rather it represents anaverage slope for the diminishing curve. In this method, the adjustments are calculated in away similar to the previous one, but using the slope of the constrained fitted line instead of themarginal ICPA. If N is the number of observations, and β is the slope of the fitted line, then:

β =

∑i∈N ∗costi ∗ convi∑

i∈N cost2i

The model is: convi = β ∗ costiWe want to minimize S =

∑i∈N (convi − β ∗ costi)2

∂S

∂β= −

∑i∈N

2costi ∗ (convi − β ∗ costi)

To minimize the error, we equalize the derivative by zero to obtain the optimum β, and thus:

−∑i∈N

2costi(convi − βcosti) = 0

∑i∈N

costi ∗ convi = β ∗∑i∈N

cost2i

β =

∑i∈N costi ∗ convi∑

i∈N cost2i

Thus, the bid adjustment for every group is:

bid adjustmentgroup =βgroupβall

The calculated adjustments using this method improved dramatically compared to the marginalICPA method, as the adjustments values are more robust to market changes due to the con-

27

strain of passing through the origin as shown in table 4.1.

4.3 Average of Slopes Method

In the constrained linear regression method, there is a possibility of having the observationswith large costs dominating the regression procedure, since these observations could have largererror values and thus affect the calculations of the coefficients that minimizes squared error.One solution to this problem is to scale the error or the penalty so that it varies with thex-values (cost) in order to have roughly a constant relative error. To achieve that, weightedleast squares could be used instead of the ordinary unweighted least squares that is used in 4.2.There are many ways we can scale the penalty, and one way to have a constant relative erroris to set our weighting factor wi for every observation to be 1/cost2i . Doing so, will make thesolution reduced to using the average of all the slopes through the origin. In other words theslope of the fitted line will be average ICPA of the daily observations.

4.3.1 Formulation

If N is the number of daily observations, and βweighted is the slope of the fitted line, with aweighted penalty wi then the model is:

convi = βweighted ∗ costi

We want to minimize S =∑

i∈N wi(convi − βweighted ∗ costi)2

∂S

∂βweighted= −

∑i∈N

2costi ∗ wi(convi − βweighted ∗ costi)

To minimize the error, we equalize the derivative by zero to obtain the optimum βweighted, andthus:

−∑i∈N

2costi ∗ wi(convi − βweightedcosti) = 0

∑i∈N

wi ∗ costi ∗ convi = βweighted ∗∑i∈N

wi ∗ cost2i

βweighted =

∑i∈N wi ∗ costi ∗ convi∑

i∈N wi ∗ cost2iIf we set our weighting factor to be 1/cost2i , that yields the average of slopes solution.

βweighted =

∑i∈N

convi

costi

N

Consequently, the bid adjustment for every group is:

bid adjustmentgroup =βweighted (group)

βweighted (all)

4.4 Total ICPA Method

The total ICPA method takes a different approach, as it works using the total or the aggregateICPA of each dimension group in the training interval unlike the previous methods that usesthe daily ICPA observations.

total ICPAgroup =total conversionsgroup

total costgroup(4.1)

28

and the bid adjustments are calculated using total ICPA which is using all conversions andcosts

bid adjustmentgroup =total ICPAgroup

total ICPAall groups(4.2)

Using the evaluation procedure described in 4.6, we find that the total ICPA method yields abetter results compared to the first three methods as shown in table 4.1. There are two reasonsfor that, the first being that the first three methods calculates the bid adjustment based onan estimation of the daily ICPA, in other words, they predict the daily ICPA. However we areinterested in using the total ICPA of a certain dimension group for the whole time interval ofsetting the adjustments. Second reason, is that this method is more robust against outliers,since it works on the aggregate performance during the training interval, unlike other meth-ods that are affected if there is an abnormal behaviour in one or more days in the training period.

4.5 Cost Weighting

The goal of the project is to increase the conversions given the same costs, After calculatingthe adjustments using any of the previous techniques, there is a risk of overspending which isincreasing the costs when we bid more for a certain group. This risk can be mitigated usingthe cost weighing technique. In this technique we modify the calculated bid adjustment for acertain group using the costs that we paid for this group in the training period. It is based onthe assumption that the cost that we will pay for every dimension group changes linearly basedon the adjustment of this group as in 4.6

new costgroup = adjustmentgroup ∗ training period costgroup (4.3)

In that sense, we calculate a weighing factor αw based on the calculated adjustments as well asthe training period cost of each group.

αw ∗adjgroup1 ∗ costgroup1 +αw ∗adjgroup2 ∗ costgroup2 + .... = costgroup1 + costgroup2 + ... (4.4)

αw =costgroup1 + costgroup2 + ...

adjgroup1 ∗ costgroup1 + adjgroup2 ∗ costgroup2 + ....(4.5)

The weighted adjustment for any group is just the calculated adjustment multiplied by αw

weighted adjustmentgroup = αw ∗ adjustmentgroup (4.6)

4.6 Evaluation Procedure

The best way to evaluate our bid adjustments is to run a real time experiment, to make surethat the adjustments actually give better results. Before doing so, it is important to find away to evaluate the adjustments in order to try out different techniques and methods withoutreal-time experiments as those can be very costly.

4.6.1 Traditional Evaluation Methods

In the data science community, there are several ways to evaluate the developed models usinghistoric data. In this section we discuss the main two evaluation techniques and then we developour evaluation procedure that fits our research problem.

29

The first technique for evaluation is bootstrapping, and this technique helps in many situa-tions like the validation of a predictive model performance which is our goal1. It works bysampling with replacement from the original data, and we take the “not chosen” data points astest cases. We do this several times and calculate the average score as estimation of our modelperformance. Another well-known evaluation technique is cross-validation, which is a techniquefor validating and evaluating the model performance. It is done by splitting the training dataset into k parts. We take k− 1 parts as our training set and use the “held out” part as our testset. We repeat that k times differently (we hold out different part every time). Finally we takethe average of the k scores as our performance estimation.

4.6.2 Proposed Evaluation Method

Our evaluation procedure entails getting historic data of the campaign performance for everydimension of interest like device and week day in the training interval. We split the collecteddata into training and testing sets. The training data is used to calculate the adjustmentsusing any method of the presented ones, and the testing period is then used to evaluate howgood our estimates for the adjustments are. In the testing period, we predict the number ofconversions for every dimension group given the costs that we paid for this group and given thetotal ICPA (in the testing period). The baseline that the results are compared against is thedefault model of not using any bid adjustment for any group. It assumes that every dimensiongroup has the same value and hence it has an ICPA equals to the total ICPA (of all groups).Since conversions = ICPA∗ cost, and the number on conversions for each group in the testingperiod is the ground truth, then the estimation of the conversions using the baseline model is:

estimated conversionsg = ICPAtotal ∗ costg

And the baseline error is calculated as follows:

errorbaseline =∑

g∈groups|costg ∗ ICPAtotal − actual conversionsg| (4.7)

On the other hand, the bid adjustments model assumes that every dimension group has a dif-ferent value which is calculated by multiplying the total ICPA with the value of the adjustmentof every group. Hence, the estimation of the conversions using the adjustments model is:

estimated conversionsg = ICPAtotal ∗ adjustmentsg ∗ costg

And the adjustments error is calculated as follows:

erroradjustments =∑

g∈groups|costg ∗ ICPAtotal ∗ adjustmentg − actual conversionsg| (4.8)

In the traditional supervised machine learning techniques, and more specifically, in regressionmodels that we use to predict a real number or an integer like conversions number, the indepen-dent variables or features are used to predict the dependent variable of each data point afterthe training phase. The quality of the regression model predictions can then be evaluated bydifferent metrics and using the data points in the testing data set. Our evaluation techniqueis different from the traditional ones because it works with aggregate values. In other words,the testing data does not consist of granular data points for which we want to predict a label.Instead, the costs, total ICPA, and the calculated bid adjustments are used to predict the con-versions of each dimension group as in equation 4.8. Moreover, In the general case of machinelearning, more data is more preferable as that makes predictive models better [41]. However

1Bootstrapping has other uses in machine learning as in the ensemble methods. For instance, we may builda predictive model like a decision tree using each bootstrap and aggregate these models in an ensemble likeRandom Forest. The prediction is done by a majority voting for all of the bagged models.

30

in our case, calculating the bid adjustments using long intervals of more than three months ofdata can be problematic, and the reason is that the market can change tremendously in theselong intervals. In other words, the products or services and their ads are changed, thereforethere is a high probability to learn our adjustments based on changed data. At the same time,less data would not give good adjustments values since they are vulnerable to outliers. Also fora dimension like weekday, one month of data means that we have just around four occurrencesfor each weekday to calculate the bid adjustments which are not enough. As we can see in table4.1, three months of training data has the best performance in terms of the mean and medianabsolute error. We can also see that working with aggregate yields the best results as explainedin section 4.4.

Trainingduration

Camps. MethodSum of Abs.Error (SAE)

Mean error Median error

1 month 202

Baseline 100712 498 162Marginal ICPA 542678 2686 419

Const. Reg. 88948 440 137Avg. of Slopes 98726 488 183

Total ICPA 54163 268 98

2 months 251



Total ICPA 68303 272 75

3 months 273



Total ICPA 71564 262 74

Table 4.1: The intermediate results in this table are calculated and analyzed using our evaluationmethod. We filter 1000 random Adwords campaigns that have data for 4 months (from January01 to April 30) to the ones whose devices and weekdays groups affect the ICPA performancewith statistical significance using Chi-squared test. The filtered campaigns are used to evaluatethe baseline and our four proposed techniques. We also evaluate three training periods intervalsof 1 month (March), 2 months (February and March) and 3 months (January, February andMarch). The testing duration is always one month (April).

4.7 Adjustments Interactions

Since there are many dimensions that we can adjust for, and some of them (like the locationdimension) can take a wide range of values, there are many combinations of dimensions groupsthat can occur, which raises several challenges.

4.7.1 Challenges

The first challenge is that major search engines do not provide the possibility to set differentbid adjustment for the same dimension group given an interaction with another dimension. Forinstance, if desktops have high ICPA compared to the average of all devices, but it performsvery poorly on Fridays, then it is not allowed to increase the bids for desktops on all days exceptFridays. The second challenge is that bid adjustments in Google Adwords are multiplicative,that means, one could set a bid adjustment of 10% for search queries originating from Stock-holm, an adjustment of -10% for queries submitted from 6-7pm, and another adjustment of20% for mobile devices. Then, for a base bid of 10 SEK, the final bid on a Stockholm mobilequery between 6-7pm would be 10SEK ∗ 1.1 ∗ 0.9 ∗ 1.2 = 11.88SEK. This situation can beproblematic, because if we have two dimension groups which have large positive adjustments,

31

then during the interaction of these two groups (even if these two groups perform poorly dur-ing interaction), their adjustments will be multiplied together leading to very large bids, andthus we can overspend. On contrast, if two dimensions have negative adjustments, then theseadjustments will be even more negative although the dimensions may perform well due to in-teractions, and thus, we can underspend. This problem is discussed in [28], and the authorswork on the setting of having a predefined budget and two dimensions for bid adjustments, eachdimension having a large number of groups. The problem is formulated as a knapsack problemso the cells that maximizes the value within the budget are captured by the algorithm. In ourcase, we still want to bid on all weekdays and devices and the definition of budget in our settingis not straightforward, as the goal is to increase the conversions given the same costs that wewould be pay without bid adjustments.

4.7.2 Solution

We deal with these challenges by formulating the adjustments calculations as an optimizationproblem. We build on the evaluation technique which is described in section 4.6 in order tooptimize the problem. Let us say that the calculated bid adjustment for devicei equals to adjiand the bid adjustment for dayj is adjj then, our estimated conversions for device i and day jis:

estimated conversionsij = costij ∗ totalICPA ∗ adji ∗ adjjthus the error that we make with these multiplicative adjustments is: actual conversionsij −estimated conversionsij . Indeed, this is the error of only the interaction between one dayand one device. Hence to get the total error, we aggregate all the individual errors for all theinteractions. Our goal in this section is to change our calculated bids taking these dimensionsinteractions into consideration in order to minimize the error. To achieve this goal, we intro-duce an coefficient for each bid adjustment and we run an optimization algorithm like gradientdescent in order to determine the coefficients that minimize the total error as below:

Minimize∑

i∈devices,j∈days

(costij ∗ ICPAtotal ∗adji ∗ coefi ∗adjj ∗ coefj −actual conversionsij)2

(4.9)It is important to point out two things, first, we minimize the sum of squared error of alldimensions interactions instead of the absolute error. The reason is that we want the costfunction to be smooth instead of be being sharp like the absolute function as in figure 4.3, thatleads to the ability of calculating the first derivative in order to minimize the cost functionby reaching the global minimum. Second, if we have just one dimension and we used theoptimization procedure, the resulting bid adjustments will be exactly the same as the totalICPA method explained in section 4.4, as both methods calculates the bid adjustment for agroup g as

convg

ICPAtotalg∗costg .

32

Figure 4.3: Absolute error function has sharp edges compared to the squared error functionwhich is smooth.

4.7.3 Minimization Procedure

In a general minimization algorithm, we have a cost function that we want to minimize, inother words, we want to find the coefficients or model parameters that gives us the globalminimum or at least a local minimum that is close to the global one as in figure 4.4. Thereare four common steps in every iterative minimization algorithm. First, we initialize the modelcoefficients either by values that we calculate, and we think they are good guesses, or if wedo not have an idea about the initial values, then we can set them to random values. Onemay think that we can calculate the bid adjustments, the same way we optimize them for theinteractions, in other words, we start by setting the adjustments to ones for all dimensionsgroups, followed by the optimization step. In that case, the resulting coefficients will be the bidadjustments. However the cost function can have several local minima where the minimizationprocess can get stuck at. The initial bid adjustments would be initial guesses that can help inreaching the global minimum. Indeed, if the cost function has just one minimum then settingthe initial bid adjustments to ones or not would give the same results.

Figure 4.4: An example for a cost function that we want to minimize, the x-axis is the coefficientand the y-axis is the cost function. We want to reach the global minimum or at least a localminimum that performs nearly as well as the global one. Figure adapted from [42].

33

In every iteration, we check if we met a stopping criterion, which determines if we are closeenough to the minimum that we should break the iteration and report the current values ofthe coefficients as the solution. Also, we calculate the derivative and and we move in the de-scent direction, in order to decrease the cost value compared to the current point. Finally, itis important to determine the step size or the length of the step in the descent direction. Notethat this step can be fixed for all the iterations so we do not have to determine it at everyiteration. However, it can be more efficient to have different step sizes based on how far weare from the optimum solution, concretely, if we are far from the optimum solution we a movein bigger steps compared to if we are close to the global minimum. In our model, we use asoftware to accomplish this task after formulating it, namely the module is optimize.minimizein the Python SciPy open source library for scientific and mathematical computing [43].

Setting bid adjustments is not a one time action. It is an ongoing process and the bid ad-justments need to be changed periodically. The goal is to reach a state where the ICPA is thesame for all dimensions groups, as that means we can not acquire cheaper conversions if we bidmore for a certain group than the other. In order to achieve that, we periodically calculate thebid adjustments and we multiply the new bid calculations by the already existing ones till wereach a state of equilibrium.

34

Chapter 5

Real-time Experiments

As shown in chapter 4, historic data is used to calculate the bid adjustments and to evaluate ourmodels, but for more reliable results, real-time controlled experiments are conducted to validateand test our final model. In this chapter, we discuss why and how we design our experiments.

5.1 A/B testing

One of the important concepts in the data science community is that correlation does not implycausation. That means we can set bid adjustments on one campaign and we get positive results.However the reason for these positive results can be other confounding factors and not the bidadjustments. So it is really a bad idea to set bid adjustments and compare the performancebefore and after setting the adjustments as there are other factors like competition situationthat can be different before and after setting the adjustments and thus, affect the results. Toavoid this problem we rely on A/B tests, also called online experiments or split tests [6]. In A/Btests, one makes two versions of what needs to be tested, which is the performance of Adwordscampaigns. One version is the campaign without any changes, called the control or originalversion, and the other is the campaign with only one change which is bid adjustments, calledthe experiment version. Having two versions with only one change is the first step, next wehave to determine what is the metric that need to track and test after the end of the experimentto check if there is a statistical difference in terms of this metric between the two versions. Inour experiment, we test if the conversions increased significantly given the same costs, so themetric that we mainly track here is the number of conversions.

5.2 Adwords Experiments

The logic behind A/B test is easy to understand, however it can be challenging to implementthe splitting of the traffic, assigning each user to a version in addition to tracking and recordingthe users events. Fortunately, in Google Adwords, there is a feature of online experiments whereit is straightforward to run an experiment of two versions of a campaign with changes in oneversion of them. The splitting in Adwords experiments is done based on the eligible ad auctionsthat each version can participate in. That means each campaign version would have 50% ofthe chances to participate in eligible auctions. However, that does not mean that each versionhas 50% of the total number of impressions, since one campaign participate in an auction anddoes not get view since due to having a low rank. Adwords explicitly notes that experimentsplit and impression share may not always be the same. For instance, the experiment can havea higher impression share than the original campaign, despite having a lower experiment split.

35

Figure 5.1: High-level structure of an online experiment. Adapted from [6].

5.3 Experiment Design

Before choosing any campaigns to test our adjustments on, we discuss how do we determinethe size of the experiment(in terms of conversions number) and how are going to evaluate theexperiments results.

5.3.1 Binomial Test

Since the metric that we are testing in the experiment is the number of conversions and we wantto test if the number of conversions increased significantly or not using our bid adjustments, thestatistical test that we use is the binomial test which is an exact test to measure the statisticalsignificance deviation from the binomial distribution of two categories. The two categorieshere are whether any conversion that we get, is from the control or the experiment versions.Each conversion can be considered as a random variable that either comes from the experimentversion (success) or from the control version (failure), thus each conversion has a Bernoullidistribution of probability p (probability of success). It is known that when we have a sequenceof Bernoulli trials which are independent and identically distributed, then their sum wouldhave a binomial distribution, and the total number of conversions of both versions would bethe sample size or in other words the length of Bernoulli trials sequence. In the binomial test,the null hypothesis is that we have equal proportions (50%) of the total number of conversions.

H0 : Pcontrol = Pexperiment

The alternative hypothesis is one-sided and we test if the number of conversions for the experi-ment version (with bid adjustments) is greater than the control version number of conversions.

H1 : Pcontrol < Pexperiment

5.3.2 A Priori Power Analysis

Test power is the likelihood of finding statistically significant differences given that statisticallysignificant differences actually do exist. In other words, power is the likelihood of rejecting thenull hypothesis when it actually should be rejected. The more power in a test, the less chancethere is to identify a non-significant difference when there actually is a significant difference.Statistically, power is expressed by 1− β ( as type II error is expressed as β.) as explained insection 3.4.

36

The power of the test is dependent upon three factors:

• Sample size

• Effect size

• Significance level

A large sample size increases the likelihood of finding statistically significant differences. Thuslarger sample sizes increase statistical power. The significance level also has an impact. When αis at a significance level of 0.1 , as opposed to .05, the critical value is lowered and the likelihoodof finding a statistically significant difference increases. As the likelihood of committing a typeI error is increased, the likelihood of making a type II error is decreased. The greater the effectsize between groups, the fewer participants are needed to identify statistical significance. PowerAnalysis can be a priori or post hoc. The purpose of the a priori power analysis is to identifythe appropriate sample size to conduct the analysis before data is even collected. Conversely,the purpose of the post hoc power analysis is to identify whether power was adequate for thetest [21]. For this test, we perform a priori power analysis, and we assume that bid adjustmentshas and effect size of at least 5%, and we would like to have a power of 95% and significancelevel of 5%. To achieve this power, we need a total number of conversions of 4300 conversions[44].

5.3.3 Experiments Campaigns

There are many campaigns that we can use for experimentation. We set a criteria for choosingthe campaign for this task. Two conditions for the diminishing return problem are alreadydiscussed in chapter 3, namely, having campaigns of average position larger than 1.5 and animpression share lost due to rank larger than 20%. In addition to that, there must be a statis-tical significant differences in devices and weekdays performances.

There are campaigns that can have issues due to what is called cross-device conversions whichis a phenomena that we discuss in section 8.1. so we have to make sure that we select campaignswith limited effect of this phenomena. Moreover, based on our power analysis we had to find acampaign that can acquire more than 4300 conversions in order to have a 95% power for ourtest of proportions. It was important to get these number of conversions during a period of onemonth, since we had limited time to present our work. In addition, it is important to point outthat apart of all the technical conditions, the account managers and the clients have to approverunning experiments with the risk of poor performance.

The campaigns that met most of the conditions for the experiments acquire around 2000 con-versions in one month, which is around half the required number of conversions for the neededpower of the test. Therefore, we decided to run two experiments and work on the aggregatenumber of conversions. Indeed, there are no problems on working on the aggregates, becausegenerally speaking, our model is general and would work on multiple different campaigns, andthe goal of our model is to significantly improve the overall performance of all the campaigns.

Our model analyzed the historic data of the campaigns using and the calculated adjustmentsvalues are described as below: For the first campaign, the adjustments for devices are: +5% forDesktops, -25% for Tablets. For weekdays, we increase the bids for Wednesdays by 11% anddecrease it for Mondays by 12%. For the second campaign, the adjustments for devices are:+6% for Desktops, -30% for Tablets and already existing -14% for Mobiles are kept the same.For weekdays, we increase the bids for Fridays by 6% and decrease it for Saturdays by 14%.

37

Chapter 6

Results

In this chapter we discuss the results of the experiments, and we analyze the results. Theanalysis entails two parts. First we show the aggregate results of the two experiments in orderto see the effect of our model on the overall performance of the campaigns. The reason is,when the advertiser uses our bid adjustments model on their campaigns portfolio. the expectedresult is a significant increase in conversions and significant decrease in the cost per conversion.It is important to note that it is not expected to get a statistically significant improvementin each campaign on its own. Instead, we expect the improvement to be on the aggregateresults, because there are more total number of conversions on the aggregates and thus we havea more powerful test to capture any difference if it exists. It is worth mentioning that we alsogot significant results on each campaign individually and this is due to the big effect size thebid adjustments had on the campaign performances. The second part of the analysis entailsdigging deeper in the results of each campaign individually to get insights about the effect ofbid adjustments on campaigns performance using real data. More data about the experimentscan be found in the appendix. We found a big difference between the control and experimentversions on the first day of the experiment in favor of the control version, the difference is notjust in the aggregate level, but also for the two campaigns. We contacted Google Adwordssupport to ask about this difference in the first day and the response was that this is verynormal, as the system is dynamic and needs some time to learn the change, so there is alwayssome kind of deviation in the first day. Adwords support recommended to take the data of thefirst day away and start tracking from the second day of the experiment in order to increasedata validity and thus have more reliable results. Concretely, the experiments started on April06, 2017 and we started tracking from April 07. The experiments should have ended on May07, however we let them run till May 11 in order to have every weekday included five times.Therefore, the experiments duration are strictly five weeks.

6.1 Aggregate Results

Through out the duration of the two experiments, we collected 4931 conversions. As describedin section 5.3.2, we need 4300 conversions to get 95% power, so we are sure with 95% certainty,that we have enough power to detect a difference if it exists. Under the null hypothesis of thebinomial test described in section 5.3.1, we should get around 2465 conversions (50%) from eachversion. However, we got 2574 conversions from the experiment version (effect size of +9%) and2357 conversions from the control. We run the binomial test, and the null hypothesis is rejectedwith p value 0.001 (less than 0.05 significance level). Consequently we can say the number ofconversions increased with statistical significance.

As shown in table 6.1, we can see that we are paying less on total with bid adjustmentsand getting more conversions. On aggregate, the cost per conversion decreased more than 10%using our bid adjustments model.

38

Costs Conversions CPAControl 53480 2357 22.7Experiment 52956 2574 20.6

Table 6.1: Aggregate results of the two campaigns.

6.2 In-depth Analysis

In the previous section, we use the aggregate data to see the overall effect of bid adjustments.However to analyze why we get these results we analyze every campaign on its own, as everycampaign has its own different adjustments values and adjusted dimensions groups.

For campaign 1 as shown in table 6.2, we can see that the cost per conversion decreases 7.6%and the conversions increased 9.6%. The cost increased slightly (just 365 SEK more or +1.6%)

Version Cost Conv. CPAControl 21707 1288 16.8Experiment 22072 1412 15.6

Table 6.2: Campaign 1: Comparison between the control and experiment versions.

On the other hand, for campaign 2 the performance improvement is even more clear. We arepaying less and significantly getting more conversions. the cost per conversion decreased by11.7%.

Version Cost Conv. CPAControl 31773 1074 29.6Experiment 30884 1164 26.5

Table 6.3: Campaign 2: Comparison between the control and experiment versions.

The question that we will try to answer in this section is why bid adjustments improved ourresults. To do that we have to segment the campaign performances by the dimensions groupsthat we adjusted for. For instance in tables 6.4 and 6.5, we can see the effect of devices bidadjustments on campaign 1. For the control group (without bid adjustments) we can see thatthe CPA for desktops is much lower than that for tablets. The effect of bid adjustments can beseen for the experiment version, as we are paying more for desktops (19254 instead of 17242)and paying less for tablets (2818 instead of 4465). Consequently, our average position for desk-tops got improved which in turn resulted in more impressions and clicks. In addition, sincethe conversion rate for desktops are higher than for tablets. we are getting more conversionsin total and less CPA. There are more insights that can be shown here, which are generallyspeaking when we bid more, our average position would be higher and the CTR gets improvedwith higher positions. Indeed, there are some cases which are different. For example for cam-paigns with more generic and broad-matched keywords, when we bid more, we participate inmore auctions that might be irrelevant and that may lead to lower average position (althoughwe increased bids) but this is not our case.

Device Cost Conv. Avg pos. CPC CPA CTR CRDesktop 17242 1095 1.61 2.89 15.7 0.26 0.18Tablet 4465 193 1.56 2.73 23.13 0.26 0.11

Table 6.4: Campaign 1: Devices performance for the control version.

39


Table 6.5: Campaign 1: Devices performance for the experiment version.

For campaign 2, shown in tables 6.6 and 6.7, we can see a similar effect of bid adjustments, andbetter allocation for our marketing budget in the groups that have lower cost per conversions.It is important to point out, that the numbers in these tables are not just the results of devicesbid adjustments, there is also an effect of weekdays bid adjustments that enhances the overallperformance for the experiment versions.

Device Cost Conv. Avg pos. CPC CPA CTR CRDesktop 11816 403 1.49 2.4 29.3 0.24 0.08Mobile 16503 585 1.39 2.1 28.2 0.26 0.07Tablet 3454 86 1.42 2.3 40.1 0.26 0.05

Table 6.6: Campaign 2: Devices performance for the control version.


Table 6.7: Campaign 2: Devices performance for the experiment version.

One of the reasons why we adjust for weekdays and devices only and not other dimensions isbecause when we analyze the results of the experiments, it would be problematic to analyze theresults having many changes that all affect the outcome. It is preferable to make one changebefore an A/B test for a smooth analysis. However in our model we have two types of bidadjustments, namely analyzing the effect of weekdays is more challenging than devices. Thereason is that devices performance is overlapping with the performance of the weekdays. Forexample, we set a bid adjustment of -10% for Mondays. so we should expect a decrease intraffic and conversions on Mondays for the experiment version compared to the control version.However we got an increase in conversions. It might be to randomness, but we believe thatsince we have device bid adjustments that overlap with Mondays we get more conversions dueto the optimization for devices. Another reason why analyzing weekdays is more challenging isthat we found using our historic data of online search campaigns that five weeks is not enoughto capture the effect of weekdays as we have just five data points of each weekday that can besensitive to outliers. Unlike the devices dimensions which we track its performance everydayand thus we have more data and more robustness to outliers.

Weekday Cost Conv. Avg pos. CPC CPA CTR CRMonday 3326 188 1.54 2.79 17.7 0.26 0.15Wednesday 3363 200 1.53 2.7 16.81 0.27 0.16

Table 6.8: Campaign 1: Weekdays performance for the control version.

Weekday Cost Conv. Avg pos. CPC CPA CTR CRMonday 3098 225 1.61 2.53 13.76 0.26 0.18Wednesday 3935 234 1.53 3.05 16.81 0.27 0.18

Table 6.9: Campaign 1: Weekdays performance for the experiment version.

40

Weekday Cost Conv. Avg pos. CPC CPA CTR CRFriday 5237 177 1.44 2.07 29.58 0.23 0.07Saturday 4181 141 1.5 2.15 29.6 0.26 0.07

Table 6.10: Campaign 2: Weekdays performance for the control version.

Weekday Cost Conv. Avg pos. CPC CPA CTR CRFriday 5148 179 1.48 2.16 28.75 0.23 0.07Saturday 3495 166 1.6 1.87 21.05 0.26 0.08

Table 6.11: Campaign 2: Weekdays performance for the experiment version.

For the two campaigns experiment versions, we expected to see more traffic, costs and con-versions for the days that we bid more, namely Wednesdays for campaign 1 and Fridays forcampaign 2. On the other hand, we expect to see a decrease in traffic and conversions for thedays that we bid less. From the results, we can see that we actually get more conversions forall the days and the reason is that the effect of devices is combined with weekdays and hencewe get an improvement for all the days.

During the course of the two experiments, we noticed a severe drop in the daily budget forthe two campaigns as shown in figure. That led to a drop in traffic and aggregate conver-sions. After investigation, we found that the reason for that drop is that the monthly budgetwas close to exhaustion, consequently, the daily budget decreased. In addition, campaign two,was halted for a few days for some business reasons out of our control. All in all, that madethe analysis of weekdays more difficult as there are days that were included in that drop ofcosts more than the others. However, that did not affect the overall results, since it happenedfor both versions. In other words, the control and experiment versions were under the same“circumstances” throughout the experiment duration.

Figure 6.1: The aggregate daily costs of the two campaigns. We can see that both the controland experiment versions are under the same circumstance throughout the experiments durationof five weeks.

41

Chapter 7

Discussion

7.1 Statistical Significance Discussion

In this section, we discuss the effect and importance of using statistical tests to compare betweenthe performances of different groups. We do that by analyzing and visualizing the time series ofthe ICPA metric for every group. The ICPA time series is calculated daily for a period of threemonths between the first March and the end of May using the data of different campaigns.Figures 7.2 and 7.1 show the daily ICPA performance for two different campaigns. The firstfigure is for a campaign that does not have a statistically significant relationship between devicesand ICPA using the Chi-squared test. As we see in the figure, the daily performance fluctuatesfor the three devices and it is not clear which device is better than the other. Our model will notadjust the bids for the campaigns of this type. The second figure is for another campaign thathas a statistically significant relationship between devices and ICPA using Chi-squared test. Asshown in the figure, desktops have better ICPA performance. The bids of this campaigns willbe adjusted by our model.

Figure 7.1: Devices ICPA performances during a three-month period for a campaign that doesnot have significant results using the Chi-squared test.

42

Figure 7.2: Devices ICPA performances during a three-month period for a campaign that hassignificant results using the Chi-squared test.

Similarly, figures 7.4 and 7.3 show the ICPA times series for the weekdays dimensions. The firstfigure is for a campaign that does not have statistically significant results using the Chi-squaredtest for independence. The figure shows messy fluctuations for weekdays performance, and itis not clear which weekday is better than the other. However, figure 7.4 is for a campaign thathas significance results and we can see that Mondays and Sundays have a better performancethan Saturdays and Fridays. Our model will adjust the bids for this type of campaigns aswe can assume that we are not calculating the adjustments based on noise as in the formercampaign. It is important to point out that these figures are for the analysis of the data andunderstanding the importance and results of the statistical tests. In the automation of ouradjustments calculations, we do not have the luxury to plot these figures for every campaign,in other words, we rely on statistics to choose which campaigns and dimensions that we adjustfor and not on visualizations.

Figure 7.3: Weekdays ICPA performances during a three-month period for a campaign thatdoes not have significant results using the Chi-squared test.

43

Figure 7.4: Weekdays ICPA performances during a three-month period for a campaign thathas significant results using the Chi-squared test.

7.2 Adjustments Calculations Discussion

When a data scientist is challenged with a research question that requires analysis of data andbuilding models the solve a problem, one needs to understand the underlying data and see thebigger picture of the research problem. In our case, our model of automating bid adjustmentsentails the input of historic data of the campaigns segmented by different dimensions and theoutput would be the bid adjustments in terms of percentages. One important step in our modelis the diagnostics of the campaigns in order to determine if the bid adjustments would improvethe performance. This step is followed by calculating the bid adjustments after learning theirvalues using training data. Indeed, our goal is to automate the process of the adjustmentscalculations so we have to make the machine do the work, and thus our journey of finding thesolution started by investigating different techniques in machine learning.

In machine learning, there are two main categories of learning, namely, supervised and un-supervised learning. In the supervised learning, we have training data which is mainly consistsof records or data points. Each data point is consists of features along with a label. The taskof the supervised learning is to predict the label of the data point if it does not exist. If thelabels are categories, then it is a classification problem and we can use techniques like Logis-tic regression, Random Forests, Support vector machines and Naive Bayes. The choice of thealgorithm is of course depends on many factors like the type and distribution of the features.Alternatively, if the labels of the data points are numbers, then it is a regression problem andwe can use an algorithm like linear regression. It is worth noting that some algorithms andtechniques can be used for both classification and regression problems as the case in decisiontrees and neural networks.

On the other hand, if the data is not labeled then it is considered unsupervised learning, as inthe case of clustering algorithms like k-means which groups similar data points into k clusters.Another example of unsupervised learning is dimensionality reduction problems, for which, wecan use techniques like principle component analysis that reduces the number of dimensions inthe data in order to reduce the noise and discover latent features.

44

In our research problem, one step in the solution is to calculate the bid adjustments for everydimension group that would estimate or predict the number of conversions given a budgetwithing an interval of time. As a result, we know that the solution would not use classificationor unsupervised learning techniques. Instead it can be considered as a regression problem. Westarted with different linear models for regression to estimate the number of daily conversionfor every dimension group. However, we found that it is better to work with the aggregatedata during a time interval and not the daily performance. In order to deal with interactions ofdifferent dimensions, we formulated the model of adjustments calculations as an optimizationproblem after designing the cost function that we want to minimize.

7.3 Risks and Recommendations

The calculations and evaluation of bid adjustments as well as running the statistical proceduresthat choose the dimensions groups eligible for adjustments are automated in our model. How-ever, it can be risky to set these bid adjustments on production for every campaign, as thereare cases that may perform worse with bid adjustments. In this section, we discuss these risksand our recommendations.

7.3.1 Cross-device Conversions

The first case, is for the campaigns that have a high effect of the cross-device phenomenon.which means for example, that users research for products on their smart phones, and thenthey converge or make the purchase on their desktops. Our model, will give a positive bidadjustment for desktops and a negative bid adjustment for mobiles, since we find in the historicdata of these campaigns, that there are more conversions with less costs for desktops comparedto mobiles and thus, the CPA is lower for desktops than mobiles. There is a possibility of losingmany potential customers and many desktop-conversions if we bid less for mobiles, because ouradvertisements are shown less to the users in their research phase.

Cross-device issues are challenging and they are out of scope of this project, however, onesolution is to adjust the bids in an incremental way instead of adjusting the bids as one batch.To be concrete, if the calculated bids are +10% for desktops and -15% for mobiles, then we canstart by smaller capped values of bid adjustments and incrementally increase the magnitudeof these adjustments every week until we converge to the calculated adjustments. We monitorthe performance of the dimensions groups during this process in order to halt it in case of anystrange behaviour.

7.3.2 Broad vs Exact Keywords

In Google Adwords, keywords are the words and phrases that are used to match the termspeople are searching for with our advertisements. There are several types of keywords but theones of interest now are exact-match keywords and broad-match keywords. Broad match isthe default match type that all keywords are assigned, and it allows our advertisements to beshown on searches that include misspellings, synonyms, and other relevant variations [45].

On the other hand, exact match allows the advertisements to be shown on search queriesthat match the exact or close variations of the exact keyword. Close variations can also includea reordering of words if it does not change the meaning, and the addition or removal of functionwords like prepositions and conjunctions [45]. We believe that bid adjustments would be morereliable for the campaigns that mainly contain exact-match keywords more than broad-matchkeywords. The reason is if we bid more for generic keywords, there is the risk of participating inauctions that we are not relevant in, and thus we get a lower position in the SERP. Therefore,we may get a smaller volume of relevant clicks and conversions although we increased the bids.On the other hand, if the campaign contains more exact keywords, then if we increase the bids,

45

it is more likely that the effect is the getting higher positions and of course participating inmore of the relevant campaigns. That leads to more relevant clicks and more conversions. Andthat is the purpose of our bid adjustments model.

7.3.3 Conversions Time Lag

This one is not a risk but rather an important recommendation when using our model for bidadjustments. In the digital advertising world, there is a term of conversions time lag which is thenumber of days it takes between the first interaction with our advertisement to the conversion.Note that first interaction here can mean either the first time the user view our advertisementor the first time the user clicks on it.

Care must be taken when analyzing the results of bid adjustments as if a campaign for a specificproduct or service has long time lag, then it may take long to see the effect of bid adjustments.So it is recommended that we wait to evaluate the performance depending on how long theconversions time lag is. In addition, this problem must be taken into consideration while cal-culating the bid adjustments. Concretely, when we collect the data in the training period, andfor reliable results, it is recommended to wait after the end of the training interval, for a periodequals to the conversions time lag of a campaign before we start on calculating the adjustments.

Adjustments calculations date > Training period end date + Conversions time lag

46

Chapter 8

Conclusions

In the academic world, we in many times face good studies and theories about different re-search problems and that is a positive thing, however some academic studies can be difficultto be used and implemented in the real world. One of the goals of this project is to createreal value for online businesses. Fortunately we were able to do so as shown by the real-timeexperiments. Value is created by the effective connection between advertisers and online userswhich significantly leads to more conversions and less costs. As a result, the advertisers growtheir revenues, and the users get the products or services they seek to acquire through theirqueries on search engines. Indeed, an effective online advertising boosts business processes ofsupply and demand in the society, and that is how we create value.

In this project, we have seen how effective bid adjustments can be, as this feature allowsus to re-allocate the online marketing budget using our knowledge about search queries fea-tures. This knowledge is acquired by analyzing historic performances of advertising campaigns,and it was found that clicks that have specific features of time, device, location, etc. can giveadvertisers more value with less costs if they are able to understand and model the performanceof different dimensions groups effectively.

This project showed us that using bid adjustments does not necessarily imply improvements inadvertising campaign performance, and that was formulated by understanding the law of dimin-ishing returns and using the metrics of average position and impression share lost due to rank.In addition, we used different statistical techniques to select the campaigns and dimensionsgroups that can be more profitable if we adjusted for them. We analyzed the data of differentcampaigns, and for some of them, we were able to see a small difference between dimensiongroups performances that actually might be due to chance and not statistically significant. Ourmodel uses statistical tests to avoid calculating the bid adjustments based on noise that mayworsen the campaigns performance and thus, may result in the loss in advertising investments.

In this thesis, we investigated different techniques to calculate the bid adjustments values,and we found that one simple technique worked the best (for one dimension) which is usingwith total values. Indeed, this follows the Occam’s razor principle for problem solving whichstates that if we have different competing solutions for a problem, the simplest solution thathas the least assumptions should be selected provided that they are better or equally good.We selected the final model by designing an evaluation procedure that uses historic data toevaluate our different methods. We showed the intermediate evaluation results of the proposedmethods for calculating the bid adjustments as well as determining the training period interval.These intermediate results helped us develop the final model that was validated using the real-time experiments. We discussed the difference between the traditional evaluation techniqueslike bootstrapping and cross-validation, and our evaluation method as the latter works withaggregate values.

47

Without an evaluation method, we would need to run a separate online experiment on realdata to evaluate every method and idea that we had. An experiment would need a duration ofmore than one month to get robust and reliable results. Indeed, we can run experiments in lessthan one month having reliable and robust results. However that would need very large cam-paigns that generate thousands of conversions in few days to get enough test power (discussedin section 5.3.2). These types of campaigns are not the norm in digital advertising based onour experience. Moreover, online experiments are costly to advertisers since we may need totest poor methods that harm the business.

We defined and addressed the challenges of bid adjustments interactions by formulating itas an optimization problem. Moreover, since the goal of this thesis is to increase the conver-sions given the same costs, we addressed the risk of overspending which is increasing the costswhen we bid more for a certain group, using the cost weighting technique. In addition, wedesigned and ran real-time experiments to evaluate the final model using A/B testing. Theexperiments showed that our bid adjustments model improved the performance of online ad-vertising campaigns with statistical significance. It increased the number of conversions with9%, and decreased the cost per conversion 10%. The experiments validated our cost weightingtechnique as the costs spent in the experiments with our bid adjustments were actually lessthan the costs spent without the adjustments. Last but not least, we discussed the concernsand risks of using bid adjustments.

8.1 Future Work

For future work, there are several things that will be done and we discuss them in this section.Throughout this project we used the ICPA metric to calculate the adjustments and evaluatethe results. However, another metric that can be more useful for some businesses is the ROAS(Return On Ad Spend). This metric is worth the investigation as for a business like jewelrystore for example, it would be better to calculate the bid adjustments based on the total valueof conversions and not the number of conversions. if on Saturdays we have less number of con-versions compared to other weekdays, but the total conversion value is more, then it is betterfor the business if we bid more on Saturdays although our current model will actually bid lessas it uses the ICPA metric. As discussed in section 3.2, the ROAS metric is not stable as theICPA and can be vulnerable to outliers, but it is worth the investigation.

In addition, we are going to build a model for all the campaigns dimensions that we canadjust for using bid adjustments. Running more experiments will be needed in order to testand evaluate adjusting for different dimensions. Also, we will need to run large scale experi-ments for campaigns in different businesses in order to evaluate the effect of bid adjustmentson the overall portfolio of campaigns instead of testing on one type of business.

The challenges of cross-device conversions that were discussed in section is an interesting di-rection of research, as without handling this challenge, we may face the risk of using bidadjustments in a way that leads to losing potential customers.

In hindsight, we found campaigns which have a very limited number of conversions (around1 conversion per day). We can investigate clustering of small campaigns using features abouttheir historic data in order to have bid adjustments calculations that are more robust againstoutliers for that kind of campaigns.

48

References

[1] Brad Geddes. Advanced Google AdWords. ISBN: 111881956X. Sybex, 2014.

[2] Barry J. Fraser, Kenneth G. Tobin and Campbell J., McRobbie Second International Hand-book of Science Education. Springer Science, 2012.

[3] Pervez N. Ghauri, and Kjell Grønhaug. Research Methods in Business Studies (4th Edition).FT Pearson, 2010.

[4] Wayne Goddard and Stuart Melville. Research Methodology: An Introduction, 2nd edition.Juta Academic, 2014.

[5] Neil J. Salkind. Exploring research, 6th edition. Pearson International, 2006.

[6] Ronny Kohavi, Thomas Crook, Roger Longbotham Online Experimentation at Microsoft,2009.

[7] European Search Awards 2017. http://www.europeansearchawards.com/winners.

[8] Gina Wisker. The Postgraduate Research Handbook. 2nd edition. Palgrave Macmillian, 2007.

[9] Isadore Newman and Carolyn R. Benz. Qualitative Quantitative Methodology: Exploring theinteractive Continuum. Southern Illinois University Press, 1st edition, 1998.

[10] Michael Myers. Qualitative research in Business and Management. SAGE Publication Inc.London, UK, 2009.

[11] William Trochim, and James P. Donnelly. The Research Methods Knowledge Base. 3rdEdition, Atomic Dog, 2007.

[12] Alan Bond. Your Masters Thesis How to Plan, Draft, Write and Revise. Studymates Ltd,UK, 2006.

[13] Mark Saunders, Philip Lewis and Adrian Thornhill. Research methods for business stu-dents. 5th edition. Pearson, 2006.

[14] Henry C. Thode. Testing for Normality. CRC Press, ISBN 978-0-8247-9613-6, 2002.

[15] Shapiro, S. S. Wilk. An analysis of variance test for normality (complete samples), 1965.

[16] Agresti, A. An introduction to categorical data analysis. Hoboken, NJ: Wiley, 2007.

[17] Delucchi, K. L. On the use and misuse of chisquare. In G. Keren & C. Lewis (Eds.), Ahandbook for data analysis in the behavioral sciences. pp. 294-319. Hillsdale, NJ: LawrenceErlbaum, 1993.

[18] Anne Hakansson Portal of Research Methods and Methodologies for Research Projects andDegree Projects. The Royal Institute of Technology, KTH, Kista, Sweden, 2013.

[19] NIST/SEMATECH e-Handbook of Statistical Methods.http://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm.

[20] Adwords API,https://developers.google.com/adwords/api/docs/guides/start.

49

[21] Rick Balkin. Statistical Power in ANOVA. Department of Counseling Texas A&MUniversity-Commerce, 2008.

[22] Miller, Rupert G. Simultaneous Statistical Inference. Springer, 1966.

[23] Tukey, John. Comparing Individual Means in the Analysis of Variance. Biometrics, Vol. 5,No. 2, pp. 99–114. International Biometric Society, 1949.

[24] P. Rusmevichientong and D. Williamson. An adaptive algorithm for selecting profitablekeywords for search-based advertising services. Proc. 7th ACM conference on Electroniccommerce, 2006.

[25] B. Edelman, M. Ostrovsky and M. Schwarz. Internet Advertising and the GeneralizedSecond Price Auction: Selling Billions of Dollars Worth of Keywords. Second workshop onsponsored search auctions, 2006.

[26] Jeffrey D. Ullman. Anand Rajaraman Mining Massive Datasets. Chapter 8, CambridgeUniversity Press, 2011.

[27] Richardson M, Dominowska E, Ragno R. Predicting clicks: estimating the click-throughrate for new ads. Proceedings of the 16th International Conference on World Wide Web,2007.

[28] MohammadHossein Bateni, Jon Feldman, Vahab Mirrokni, Sam Chiu-wai Wong: Multi-plicative Bidding in Online Advertising, 2014.

[29] Jon Feldman, S. Muthukrishnan, Martin Pal, and Clifford Stein. Budget optimization insearch-based advertising auctions. ACM, 2007.

[30] Google Support. Setting bid adjustments.http://support.google.com/adwords/answer/2732132

[31] HubSpot. The most important changes to Google adwords in 2013,http://blog.hubspot.com/marketing/google-adwords-changes-2013-list

[32] Google Support. Ad auctions. https://support.google.com/adwords/answer/142918?hl=en&ref_topic=24937

[33] Google Support. Actual cost-per-click. https://support.google.com/adwords/answer/6297?hl=en&ref_topic=24937

[34] Ahmad Zainal-Abidin and Jun Wang. Maximizing Clicks of Sponsored Search by IntegerProgramming, 2010.

[35] Esther Weusthof. Adwords Optimization. VU University of Amsterdam, 2015.

[36] Howie Jacobson, Kristie McDonald. Google Adwords for Dummies. Publisher: For Dum-mies; 3rd edition, 2011.

[37] Create a system for managing your bids. https://support.google.com/adwords/answer/6167139

[38] Howell, David Statistical Methods for Psychology. Duxbury. pp. 324–325. ISBN 0-534-37770-X, 2002.

[39] Anderson, David R., Sweeney, Dennis J., Williams, Thomas A. Statistics for business andeconomics (6th ed.) Minneapolis/St. Paul: West Pub. Co. pp. 452–453. ISBN 0-314-06378-1,1996.

[40] Chi squared test. Harvard University. Course code: STAT E-50 http://isites.harvard.

edu/fs/docs/icb.topic1240320.files/Chi%20Square%20-%20slides%20-%20S13.pdf

[41] Pedro Domingos A Few Useful Things to Know about Machine Learning University ofWashington. https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

50

[42] Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning MIT Press, 2016.

[43] SciPy open source Python library https://www.scipy.org

[44] Power Calculations For Proportion Tests https://www.rdocumentation.org/packages/

pwr/versions/1.2-0/topics/pwr.p.test

[45] Google Adwords Guides - About keyword matching options. https://support.google.com/adwords/answer/2497836?hl=en

All links in this section are verified as of June 10, 2017.

51

Appendices

52

In this appendix, we show the detailed results for devices and weekdays in the two campaignsthat were used in the real-time experiments.

Version Cost Conv. CPC CPA CRControl 21707 1288 2.85 16.8 0.17Experiment 22072 1412 2.88 15.6 0.18

Table 1: Campaign 1: Comparison between the control and experiment versions.


Table 2: Campaign 1: Devices performance for the control version.


Table 3: Campaign 1: Devices performance for the experiment version.

Weekday Cost Conv. Avg pos. CPC CPA CTR CRMonday 3326 188 1.54 2.79 17.7 0.26 0.15Tuesday 3691 248 1.54 2.77 14.88 0.29 0.18Wednesday 3363 200 1.53 2.7 16.81 0.27 0.16Thursday 3629 237 1.56 3.02 15.31 0.27 0.19Friday 2867 150 1.64 2.97 19.11 0.24 0.15Saturday 1944 124 1.68 2.93 15.67 0.24 0.18Sunday 2887 141 1.65 2.89 20.47 0.27 0.14

Table 4: Campaign 1: Weekdays performance for the control version.


Table 5: Campaign 1: Weekdays performance for the experiment version.

Version Cost Conv. CPC CPA CRControl 31773 1074 2.2 29.6 0.07Experiment 30884 1164 2.2 26.5 0.08

Table 6: Campaign 2: Comparison between the control and experiment versions.

53


Table 7: Campaign 2: Devices performance for the control version.


Table 8: Campaign 2: Devices performance for the experiment version.


Table 9: Campaign 2: Weekdays performance for the control version.


Table 10: Campaign 2: Weekdays performance for the experiment version.

54

TRITA -ICT-EX-2017:76

www.kth.se

automated bid adjustments in search engine advertising1119130/fulltext01.pdf · keywords: digital...

Documents