optimization of digital marketing campaigns
DESCRIPTION
Draft paper on using predictive analytics to optimize digital marketing campaignsTRANSCRIPT
Optimization of Digital Marketing Campaigns
Armando Vieira, Inesting Abstract In this work we apply several clustering, visualization and predictive machine learning techniques to analyse data from digital marketing campaigns. For data exploration we used unsupervised techniques like k-‐means, Principal Component Analysis (PCA), Multidimensional Scaling (MDS) and Self-‐Organized Maps (SOM). We identified patterns that help the analyst understand the vast amount of data produced by digital trails and guide their actions (actionable insights). Support Vector Machines and Random Forest algorithm were used for supervised learning of conversions prediction. Keywords: ad optimization, Adwords, Predictive Analytics, SEO, digital marketing
1 Introduction Online advertising has evolved into a $50 billion industry and continues to grow by double digits. On the other hand, powerful web analytic tools, such as Google Analytics, Facebook Insights or Kissmetrics, provide key data easily available to anyone who wants to monitor the performance of their campaigns online. For e-‐commerce sites, the analyst has the ability to track every single action of the visitor over the conversion path and answer the fundamental questions: who, what, why, how and when, from a lead to the purchase.
Our interest lies in monitoring the impact campaigns have on website traffic, engagement and revenue (in the case of e-‐commerce). A principal form of online advertising is the promotion of products and services through search-‐based advertising. Today’s most popular search-‐based advertising platform is Google Adwords, having the largest share of revenues. Search remains the largest online advertising revenue format, accounting for 46.5% of 2011 advertising revenues, up from 44.8% in 2010. In 2011, Search revenues totalled $14.8 billion, up almost 27% from $11.7 billion in 2010.
This gives an unprecedent power to the marketing team but at a cost: the huge amounts of unstructured, disparate and complex data to be processed and parameters to be adjusted. The effort required to deal with the number of options and configurations for optimal performance of a company website is simple far beyond human capabilities.
Furthermore some parameters have non-‐linear interactions: for instance the quality of the SEO boosts the position of the Ad in Adwords campaigns, thus achieving a better performance for a lower PPC. The budget allocated to the campaign also influences the Ad position. There are even subtler influences and nuances when measuring the ROI. For instance, it is known that although display advertising brings very little direct sales, it may boost the performance of search Ads since users where previously exposed to the product or brand.
To optimize this myriad of parameters we need to rely on machine learning algorithms to extract actionable insights and answers some simple questions like: how to improve my return on investment (ROI)? How to boost costumer engagement? What product generate most interest? What catalysis sales? What strategy to opt? What channels to choose? How much should I invest? When, how? These are very important question with no clear a single answer. Most of them depend on each case, and some are two vague to be answered.
Under these circumstances, the safe strategy starts by design carefully an ad, select adequate keywords, set the bids, segment the campaign properly and test continuously for
fine-‐tuning. If results are not as expected, then look at the data, learn, make corrections, and repeat the cycle. Most the research have been focused on the publisher side, trying to device strategies to maximize the CTR of Ads, by means of content contextualization, ads personalization among others [**]. In this work, however, we take the perspective of the advertiser and will explore the potential of machine learning tools for prediction and optimization of the marketing strategy. The objective is to maximize performance and effectiveness of marketing campaigns, namely the Return On Investment (ROI). We propose a system to extract information from Google Analytics and determine the most important for optimization.
The article is organized as follows. In section 2 we introduce the data and pre-‐processing. In section 3 we explore the data and extract relevant features using clustering algorithms, like k-‐means, PCA and MDS and SOM. In Section 4 we introduce the supervised learning, where we predict Conversions, Revenues and user engagement. Finally in section 6 some conclusions are drawn.
2 Data
2.1 Data Extraction and description Data was collected from a costumer running campaigns on an ecommerce site with Adwords campaign, Facebook and email marketing. Data, collected on a daily frequency over a period of 6 months, is described in Table 1. Our main data sources are Google Analytics (GA) -‐ that aggregate data from Google Adwords -‐ and Facebook Insights. We focused on inputs that may give us access to insights, namely correlations between conversions and site usage or Adwords campaigns.
We used the package RGoogleAnalytics (RGA) to extracted data into R from Google Analytics. We collected data from Adwords, Facebook and email campaigns -‐ Table 1. Data was collected over different timeframes and consolidated by date. For some cases, data was decomposed by traffic source in GA, and by group segment as in case of Adwords, so each data point corresponds to a specific segment on a specific day. Two data set were build: Data 1: with just adwords other with analytics+facebook+email: Data 2.
Table 1 variables used for analysis. The colour fields are data from campaigns.
Variable Name (Metric/Dimension)
Comments
Traffic source TO (D) Organic, Email, Adwords, Facebook, Others
Visit length VL (M) Number of visits NV (M) Bounce rate BR (M)
General
Page per visit PV (M) Ad/campaign group CG (D) Group of Ad Cost per Click CPC (M) Position P (M) Type T (D) Search, display Click Through Rate CTR (M)
AdWords
Conversion Rate CRA (M)
F a c e b o o k
Impressions Imp (M)
Click through rate CTRf(M) Cost per like CPL (M)
Convertion Rate Facebook CRF(M)
Emails Sent Em (M) Open Rate OR (M) Click Rate CT (M)
Emails
Conversion Rate email CRE (M) Total revenue Re (M) Revenue from sales
2.2 Performance Ratios For visualization proposes, we consider several aggregated metrics to benchmark the performance of a website and the digital campaigns. We divide the metrics into two major categories: website usability and financial performance. All indexes are defined to have values between 0 and 1.
A site can be highly engaging…
Website usability metrics We defined the engagement as a composite index, defined according to [8] as:
€
E = Cdi +Ddi+Idii∑ + (1− Bri)
where Br is the bounce rate and the other indices are defined below. The sum runs over any aggregation metric that we may be interested. The coefficients are obtained from sessions originated from a particular dimension: visitor id, traffic source, time, etc. This index has the advantage of benchmarking the quality of the site and the interaction of user with the content.
Click Depth index (Cd) measures the degree depth visits and is defined as:
€
Cd =Sessions with at least 4 page views
All sessions
Duration Depth index (Dd) measures the intensity of the visits captured by the
duration of visits on the website. It is defined as:
€
Dd =Sessions with a duration of at least 3 min
All sessions
The Interaction depth index, (Id), captures the visitor interaction with content or
functionality designed to increase level of Attention. It is defined as:
€
Id =Sessions where visitors complete an action
All sessions
where an action can be defined as a goal on GA, from downloading a document, to filling a form or watching a video. Financial metrics Engagement with a website is important, but the really important metrics, especially for e-‐commerce sites, are sales or leads. This is captured by financial metrics ratios. There are dozens of financial ratios to measure efficiency of a sales channel, but we will focus on the following:
• CR, Conversion Rate • RPC, Revenue Per Channel • ROI, Return On Investment
The CR rate is simple defined as:
€
CR =Sessions where visitors purchage a produt
All sessions
Typical CR are low, 1% is considered very good for most sites, but it can be as low as 0.001%.
The Revenue per channel (RPC) is the total value earned by a sales channel over a fixed period of time. The ROI of a channel is simply the ratio of revenue per total investment made on this channel:
€
ROI =RPC
Total cost
In Figure 1 we show the evolution of Engagement and ROI over time for the 2 mains traffic origin sources.
Figure 1: Engagement over time (days) for using a moving average.
In Figure 2, we plot the revenue per origin of traffic. The most important source for revenue was Facebook, while Google Organic ranks second and Adwords third. The most consist channels are Direct traffic and email newsletter.
Figure 2: revenue distribution per channel (top 6).
3 Data visualization with unsupervised techniques In this section we will use some techniques for data exploration and visualization in order to detect patterns and features that are hidden in high dimensional data. We will use non-‐supervised clustering techniques, from simpler ones, like k-‐means, to more elaborate one, like
Self Organized Maps (SOM) and Multi Dimensional Scaling (MDS).
3.1 Adwords Data We start by characterizing the data by plotting the box plots in Figure 3 where the number of conversions, the CTR and CR are displayed for all Adgroups in our campaign. There are three Ad groups that have the majority of conversions (sales): group 9, 10 and 11. The average CTR is almost constant for most of the groups (around 6%), but in some cases we don’t have enough data to evaluate it with accurately. The average position is 1.68 and the average CR is 0.2%, showing a greater variability than the CTR.
Figure 3: Boxplot of CTR (red), number of conversions (blue) and CR (green) for all Adwords groups
In Figure 4 we plot the weekly revenues and costs over a period of 6 months of the adwords campaign. Initally the campaign was not very efficient since we run a trial period to test and optimized its content, targeting and keywords. After week 6 a boost on investment also bring a more than propotional increase in sales.
Figure 4: Revenue and cost per week on Adwords campaigns.
Clustering We then cluster the data using the k-‐means algorithm. K-‐means is one of the simplest and most widely used algorithm for non-‐supervised clustering. The only input is the number of clusters k and the metric used to calculate the distances between points. We tested the algorithm from two to five clusters using the Euclidian distance on the Adwords data. The optimum compromise between intra and inter cluster distance was achieved at k = 3 clusters. Results are presented in Figure 5 where we selected the dimensions CTR and number of Clicks as representative axis. The four patterns are very clear in this figure and the centroids are presented in Table 2. It can be seen that most conversions come from the green group, which corresponds to the greater number of visits and clicks. Number of page visits is also a strong indicator of revenue. Error! Reference source not found. show the clustering on page views and visitors. CTR, CPC and position are almost the same for the three groups.
Figure 5: K-‐means algorithm with 3 clusters for data set 1.
Table 2: Centres of the 4 clusters obtained by kmeans for the Adwords data set
Cluster Cost Clicks Imp. Revenue CTR(%) CPC Position
1 56.7 327 4739 85.1 0.07 0.14 1.79
2 81.7 474 6610 124.9 0.08 0.15 1.71
3 20.8 73 1194 14.1 0.06 0.17 1.30
In Figure 6 we plot the structure of Graph of correlations with R function qgraph for the Adwords data set. There are strong correlations between **.???
Figure 6 correlations with QGrapph
3.2 PCA Principal Component Analysis is one of the oldest and wider used approaches to compress high dimensional data into a sub-‐set of linear components. It has the disadvantage of being a linear model, but it still very useful. In Figure 7 we plot the eigen-‐values of the components in a bi-‐dimensional plot. Two main principal components are clearly seen. Note that conversions are highly correlated with ad groups.
Figure 7 PCA for the Adwords (left) data and Google Analytics (right).
3.3 SOM Self-‐organizing map (SOM) is an unsupervised neural network proposed by Kohonen (Kohonen 2001) for visual cluster analysis. The neurons of the map are located on a regular grid embedded in a low (usually 2 or 3) dimensional space, and associated with the cluster prototypes. In the course of learning process, the neurons compete with each other through the best matching principle, i.e., the input is projected to the nearest neuron using a defined distance metric. The winner neuron and its neighbours on the map are adjusted towards the input in proportion with the neighbourhood distance, consequently the neighbouring neurons likely represent the similar patterns of the input data space. Due to the data clustering and spatialization through the topology preserving projection, SOM is widely used in the context of visual clustering applications.
SOM is very appropriate to analyze the high-‐dimensional data of digital metrics range of research groups concentrate on the bankruptcy prediction problem, usually solved as a classification task to separate the companies into distress and healthy category (binary) or a number of predefined credit rates (multi-‐class).
SOM is used to determine the class through a visual exploration (Merkevicius, Garsva & Simutis 2004). An enhanced version of LVQ can boost the prediction performance of multi-‐layer perceptron neural network (Neves & Vieira 2006). In cooperation with independent component analysis for dimensionality reduction, LVQ is employed to recognize the distressed French companies (Chen & Vieira 2009).
Figure 8: SOM for data set 1 – Adword campaigns on a 6x5 = 30 cells space.
3.4 MDS SOM methods, presented previously, involves the estimation of the conditional probability which is computationally expensive and hard to extract. Here we test the Multidimensional Scaling algorithm (MDS). MDS, is a non-‐linear approach, mostly used for visualizing, that captures the level of similarity of individual cases of a dataset. It is used to display the information contained in a distance matrix, evaluated according with some metric. The MDS algorithm place each object in N-‐dimensional space such that the between-‐object distances are preserved as well as possible. Each object is then assigned coordinates in each of the N dimensions. The number of dimensions of an MDS plot N can exceed 2 and is specified a priori. Choosing N=2 optimizes the object locations for a two-‐dimensional scatterplot -‐ Figure 9.
Figure 9: Aggregation by MDS on data set 2. Colours represents revenues levels (black = lowest, light blue =
highest).
3.5 Heatmaps and ROI We now investigate the return on investment (ROI) from Adwords and Facebook campaigns. The Facebook campaign run over the same period as the Adwords with a daily budget between
10 and 40 euros -‐ Figure 10. The ROI is in general bigger than 1, meaning that the campaign is producing good results. We we consider the global performance (Sales originated from all channels) the ROI almost duplicate – considering as cost only the investment in Adwords and Facebook.
Figure 10 : ROI over time (days) -‐ using moving averages: (red) Adwords, (blue) Total.
We now plot the ROI for the payed channels. Email is number one, as expected, due to
the small cost of promotion. ROI and Eng for Data 1. **
Heat maps Heat maps are a good visualization method for data exploration and causality explanation. In this case we use it to explore conversions and engagement into a calendar to visually spot trends. We use the GGplot2 library to create a Calendar heatmap with data from 6 months. We plot engagement, visits as well as transactions on calendar so we get perspective on how they interact viz-‐a-‐viz timeline.
In this case it is interesting to note that Tuesdays have high visits days but Wednesday has been the day when most transactions occurs. Visits increases towards the end of year (shopping season) and then slows down towards year start. Engagement has been improving over time.
Figure 11: Heatmap calendar for visits (top) and revenue (bottom) over the last 6 months.
4 Supervised Learning for Revenue Prediction In previous sections we explored the data patterns without concerns about causality between observations (non-‐supervised learning). In this section we go a step forward and use supervised learning to make predictions on data based on past records. This is very important as it provides explanation, “the why” instead of “the what” as we enter the field of predictive analytics.
First we consider the problem from a broader perspective: can we predict the revenue from a certain channel by looking at the data traffic generated? If so, with how much accuracy and confidence? What is the difference in behaviour from a user
that finalizes a purchase from other users? To answer these questions we run supervised algorithms trained with past data and perform classification analysis.
First step, we enrich our data extracting extra metrics drill down by 5 dimensions (time, traffic source, adwords ad group, operating system, and city). The metrics used are: number of visits, average pages per visit, average visit duration, bounce rate, visit depth, CTR, page load time, social interaction and cost of ads on Adwords and Facebook. From these metrics we extract the additional performance ratios described in Section 2.2. In which concerns the traffic sources, we selected only the top 10 performers. We consider a conversion when at least one sale is concluded. All data is aggregated with a daily granularity.
We run the algorithms as a classification task, trying to predict when a given visit leads to a conversion in a given session. The data set contains 5680 sessions of which 432 have conversions. We used Support Vector Machines and Random Forest algorithm since they can easily deal with categorical and continuous inputs, can be trained with very few examples, and does not overfit.
Since many more visit lead to non-‐conversions than conversion, we create a balanced data set by randomly eliminating entries that don’t lead to conversions. We end up with 864 training examples. All data was normalized and the algorithm was tested using 10-‐fold cross validation.
In Figure 13 we plot the ROC curve obtained over a period of 165 days. The AUC obtained was 0.84. For comparison, we used SVM and the AUC = **. This is somehow surprising result given the small set of inputs. In order to separate the traffic from Adwords, we run the algorithm without traffic from this source. The results have improved slightly.
Random forest returns several measures of variable importance. The most reliable measure is based on the decrease of classification accuracy when values of a variable in a node of a tree are permuted randomly, and this is the measure of variable importance.
Table 3 presents the best discriminating indicators in predicting conversions: traffic origin and the number of visits – see also Figure 12.
Figure 12: dispersion of inputs for data set 2.
Figure 13: ROC curve for the conversion prediction with Random Forest and SVM algorithms. FPR: False positive rate, TPR: true positive rate.
Table 3: Best performing conversion prediction indicators for the two datasets.
All Variables All without Adwords
Traffic Source Number of visits
Number of visits Bounce Rate
Bounce rate Visit Length
Visit length Time on site
5 Conclusions In this work we have used a set of machine learning techniques for data exploration and predictive analytics. It was shown that exploratory tools can help understand the dynamics of digital campaigns.
We used Random Forest algorithms (a collection of decision trees) and SVM to predict
the conversions with a reasonable accuracy. The most important features are number of visits, origin of traffic and visit duration. Surprisingly, we found that CTR and CR have little influence as predictors of conversions.
6 References • 1. Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz: "Internet Advertising
and the Generalized Second-‐Price Auction: Selling Billions of Dollars Worth of Keywords". American Economic Review 97(1), 2007 pp 242-‐259
• 2. P. Maille, E. Markakis, M. Naldi, G. D. Stamoulis, B. Tuffin. Sponsored Search Auctions: An Overview of Research with Emphasis on Game Theoretic Aspects. To appear in the Electronic Commerce Research journal (ECR).
• 3. Andrei Broder, Vanja Josifovski. Introduction to Computational Advertising Course, Stanford University, California
• 4. Anand Rajaraman and Jeffrey D. Ullman. Mining of massive datasets. Cambridge University Press, 2012, Chapter 8 – Advertising on the Web
• 5. James Shanahan. Digital Advertising and Marketing: A review of three generations. Tutorial on WWW 2012
• 7. IAB’s Internet Advertising Revenue Report http://www.iab.net/AdRevenueReport • http://www.webanalyticsdemystified.com/downloads/Web_Analytics_Demystified_an
d_NextStage_Global_-‐_Measuring_the_Immeasurable_-‐_Visitor_Engagement.pdf