shrinking big data for real time marketing strategy - a statistical report

26
Page 1 | 26 1 Shrinking Big Data for Real-time Marketing Strategy A Statistical Analysis Report (Using R – statistical Language) (DeBois, 2015) Authors: Manidipa Banerjee (MBA-MIS) University of Massachusetts Dartmouth Ankita Zaveri (MBA-Marketing) University of Massachusetts Dartmouth Abstract: Marketing is increasingly data driven. To develop strategies, it needs efficient tool to analyze the data that are valuable in decision making process. An online shopping experience involves customer interaction, their sentiments involved with the products and their price value. To determine this semi-structured data, we need a tool that would provide the right structure to analyze this data, identify the core competencies and predict the product value as well as market share. R- Statistical language proves to be the efficient environment from where we can acquire a spectrum that provides all the capabilities needed for the marketing decisions. This report is based on the data provided by an online shopping retailer known as “DiamondStuds” (DiamondStuds, n.d.). A wide range of data that includes details of Product, Revenue, Transactions, Order, Social Trends etc. are used for the evaluation.

Upload: manidipa-banerjee

Post on 23-Jan-2018

83 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 1 | 26

1

Shrinking Big Data for Real-time Marketing Strategy

A Statistical Analysis Report

(Using R – statistical Language)

(DeBois, 2015)

Authors: Manidipa Banerjee (MBA-MIS)

University of Massachusetts Dartmouth

Ankita Zaveri (MBA-Marketing)

University of Massachusetts Dartmouth

Abstract:

Marketing is increasingly data driven. To develop strategies, it needs efficient tool to

analyze the data that are valuable in decision making process. An online shopping experience

involves customer interaction, their sentiments involved with the products and their price

value. To determine this semi-structured data, we need a tool that would provide the right

structure to analyze this data, identify the core competencies and predict the product value as

well as market share. R- Statistical language proves to be the efficient environment from where

we can acquire a spectrum that provides all the capabilities needed for the marketing decisions.

This report is based on the data provided by an online shopping retailer known as

“DiamondStuds” (DiamondStuds, n.d.). A wide range of data that includes details of Product,

Revenue, Transactions, Order, Social Trends etc. are used for the evaluation.

Page 2: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 2 | 26

2

I. Contents II. Introduction ................................................................................................................................... 3

III. The Case Study .............................................................................................................................. 3

IV. Results and Discussion ................................................................................................................ 4

1. Identifying Product Performance .......................................................................................... 4

2. Identifying Target market ....................................................................................................... 6

3. Twitter Analysis ...................................................................................................................... 14

4. Traffic Analysis ....................................................................................................................... 15

V. Recommendations - Predictions ............................................................................................... 19

VI. Conclusion .................................................................................................................................... 19

VII. References ..................................................................................................................................... 20

VIII. Appendices: R-Code ............................................................................................................... 20

Page 3: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 3 | 26

3

II. Introduction

“Big Data is the biggest game-changing opportunity for marketing and sales since the Internet

went mainstream almost 20 years ago. That statement often prompts vigorous head nodding from

executives, but is quickly followed by head scratching. “How can we make this happen?” (Forbes, 2013)”

Data reveals insights that are useful to identify products, their market value and social

trends as well as provide opportunities to decision making process. The Data journey starts

from consumer decision, product evaluation, and transactions and all the way through its

shipping process. With this enormous amount of data, a wide range of possibilities arise from

the effective data analytics process that help retailers take valuable marketing decisions.

R- Statistical language help marketers make better decisions based on the history data

and provide predictions based on it. This project is based on marketing decisions that are made

based on the statistics and algorithm programmed by R.

III. The Case Study

DiamondStuds.com is an online jewelry store that offers a wide range of customized

diamond stud earrings. They specialize in providing affordable diamond studs with a wide

variety of options.

Their list of services includes Certificate of Authenticity & Free Appraisal Report, Safe

packaging and insured products, free 30-days returns & exchange, Lifetime warranty and

Lifetime upgrades. They use extensive marketing tactics to gain more customers each year.

Their sales have increased tremendously every year and continue to do so.

The company makes most of its revenue during the months of November and December

(the holiday season). This time period is very important for them. With over 2000 orders in 2015

the approximately 1200 of their orders were in the months of November and December. Due to

this concentrated sale period the company is looking for a unique marketing plan for the year

2016 holiday season.

Their current marketing plan includes Facebook target marketing, Google AdWords,

Email Marketing and SEO. They also offer Deal of the Day and Sign-up Discount offer on their

website.

An online business faces multiple marketing problems which require tough decisions.

One of the problems with providing a wide variety is not being able to identify the star product.

Customization make it difficult for the company to distinguish between what is selling well and

what just seems to be selling well. It is also difficult to analyze your customer traffic in detail,

sectionalize them and make predictions based on that information.

It would be great if the knew what variables to consider, what products are more popular, what

are the sources they are earning the most from. We believe that our analysis will help answer

Page 4: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 4 | 26

4

some of these tough questions and support their marketing plan as a whole. It will allow them

to make more informed decisions that could result in improved sales for 2016 holiday season.

IV. Results and Discussion

A. Back-end Application

1. Identifying Product Performance

To sell diamond jeweler, one has to acquire sufficient knowledge of quality, time

and price, nevertheless, jewelry market is unpredictable and prone to change in

short period of time. To maximize profit in the market, product and its value hold

the crucial role in the process. It also becomes important to know about the product

and its market worth. Upon a standardized price value, organizations can collect a

large amount of profit, based on their statistical report for previous years. Also, it

provides an opportunity to identify the most Revenue given products.

The below figure: 1 shows the products that are allotted in the online shopping

website of Diamondstuds.com

Figure 1: Back- end view of Products at DiamondStuds.com

Methodology:

The highlighted area shows the corresponding dates and the Product Revenues.

Using R - language, a statistical data can be visualizing where a range of products

that provided Revenue during a particular time of year can be determined. Below

Figure: 2 produces a clear picture of the sales around a particular time period

(November - December) during 2014-2015.

Page 5: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 5 | 26

5

Figure 2: Revenue Vs Sales Period

To locate the high sales area, a geographical map is very much useful.

Figure 3: Locating High SALES area

A list of products that are sold all over the world during that period of time is shown in

the figure below:

Page 6: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 6 | 26

6

Figure 4: Identifying the top list: Figure 5: Identifying the bottom list:

2. Identifying Target market

a) Interests- Affinity Categories

Affinity categories are used to reach potential customers, to make them aware of

your brand or product. These are users higher in the purchase funnel, near the

beginning of the process. While using AdWords you can add audience targeting to ad

groups in your campaigns to reach people interested in products and services similar to

those your business offers, even when these people are browsing websites, using apps,

or watching videos not directly related to your products and services. By doing so, you

can help boost your campaigns’ performance. Depending on your advertising goals and

the stage of the purchase process your customers are in, you can choose to add different

audiences to your ad groups. Affinity Categories are also used by Facebook to help you

reach specific audiences by looking at their interests, activities, the Pages they have liked

and closely related topics. These interests are combined to expand your ad's reach.

Page 7: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 7 | 26

7

Figure: 5

We used the following ten variables for cluster analysis of the Interests of Users

as per their Affinity Categories.

Affinity Category

Affinity Category Sessions

% New Sessions New Users

Pages / Session Quantity

Avg. Session Duration Transactions

Revenue Ecommerce

Figure 7: Scatter Plot Sessions vs Revenue

With the above scatter plot we can observe that there is a relationship between

the Number of Sessions of users and Revenue. This helps us identify the general

distribution of the Affinity Interests forming three clusters. We can use this information

Page 8: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 8 | 26

8

for further analysis in determining the ideal number of clusters as well as the placement

of the Affinity Interests in relation to relevant variables.

We then normalize all the variables except the first column and calculate the distance

matrix with Euclidean distance as default. (#Normalize & calculate). This allows us to create

cluster Dendogram with complete linkage as well as cluster Dendogram with average

linkage. (#Cluster Dendogram) These diagrams help us observe the different clusters

formed with the linkages and help us easily classify the interests accordingly.

Figure 9: Cluster Dendogram (average linkage)

We then characterized clusters by creating a vector showing the cluster membership.

(#Characterizing clusters) This allowed us to plot the following silhouette plot.

Figure 10: Silhouette Plot

Page 9: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 9 | 26

9

From the above silhouette plot we can see three clusters formed with 32 interests

with a distribution of 25, 6 and 1 in each cluster. It helps us support the conclusion of

ideally requiring 3 clusters for the most optimized analysis.

We then used K-means clustering analysis to conclude that the contributing

variables towards cluster formation are Sessions, New Users, Pages / Session,

Transactions, Revenue and Quantity. (#K-means Clustering)

We can identify these variables by observing the difference between the highest

and lowest values of their respective cluster means. These values determine the high

influence of these variables on the selected clusters. Using this information, we plotted

clusters to find the relevance of Ecommerce Conversion Rate with Revenue. Ecommerce

Conversion Rate is the percentage of visits that resulted in an e-commerce transaction.

The below scatter plot helps us classify users according to their spending

patterns. The users in green have High Ecommerce Conversion Rate and High Revenue

i.e. they are more likely to buy expensive products. The users in black have High

Ecommerce Conversion Rate and Low Revenue i.e. they are more likely to buy

inexpensive products. The users in red have Low Ecommerce Conversion Rate and Low

Revenue i.e. they are less likely to buy expensive products.

Figure 11: Scatter Plot of Clusters - Revenue vs Ecommerce Conversion Rate

Less likely to Buy Expensive Products More likely to Buy Inexpensive Products

More likely to Buy Expensive Products

Page 10: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 10 | 26

10

From this distribution the company can decide where they want to direct their

resources in their marketing plan. For example, we now know that Business

Professionals, Outdoor Enthusiasts, Do-It-Yourselfers, Thrill Seekers, tend to have a

higher ecommerce conversion rate but purchase less expensive products. The company

can now target this audience to cater to their needs by creating customized marketing

plans.

Another example is of the users who are less likely to buy expensive products.

We now know that Family-Focused, Fashionistas, Pet Lovers, Cooking Enthusiasts, have

a lower ecommerce conversion rate and purchase less expensive products. The company

can target this audience and create marketing plans that focus on increasing their

ecommerce conversion rate.

b) Interests- In-Market segments

Users in these segments are more likely to be ready to purchase products or

services in the specified category. These are users lower in the purchase funnel, near the

end of the process.

While using AdWords Companies can select from these audiences to find

customers who are in the market, which means that they are researching products and

are actively considering buying a service or product like those you offer. In-market

audiences are available to advertisers in all AdWords languages.

These audiences are designed for advertisers focused on getting conversions

from customers most likely to make a purchase. In-market audiences can help drive

remarketing performance and reach consumers close to completing a purchase. We used

the following ten variables for cluster analysis of the Interests of Users as per their In-

Market Segments.

In-Market Segment Sessions

% New Sessions New Users

Pages / Session Avg. Session Duration

Transactions Revenue

Ecommerce Conversion Rate Quantity

We used the same method from Interests- Affinity Categories and created the following

graphs.

Page 11: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 11 | 26

11

Figure 12: Scatter Plot Sessions vs Revenue

Figure 13: Cluster Dendogram (average linkage)

Page 12: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 12 | 26

12

Figure 14: Silhouette Plot

With the scatter plot in the above figure we can observe that there is a

relationship between the Number of Sessions of users and Revenue. This helps us

identify the general distribution of the In-Market Segments forming three clusters. In the

above silhouette plot we can see three clusters formed with 18 interests with a

distribution of 3, 10 and 5 in each cluster. It helps us support the conclusion of ideally

requiring 3 clusters for the most optimized analysis.

K-means clustering with 3 clusters of sizes 9, 6, 3

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"

As per the above analysis the contributing variables towards cluster formation

are Sessions, New Users, Transactions, Revenue and Quantity.

We can identify these variables by observing the difference between the highest and

lowest values of their respective cluster means. These values determine the high

influence of these variables on the selected clusters.

Using this information, we plotted clusters to find the relevance of Ecommerce

Conversion Rate with Revenue. Ecommerce Conversion Rate is the percentage of visits

that resulted in an e-commerce transaction.

Page 13: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 13 | 26

13

Figure: 15 Scatter Plot

The above scatter plot helps us classify segmented users according to their

spending patterns. The users in black have Mid Ecommerce Conversion Rate and High

Revenue i.e. they are likely to buy expensive products. The users in green have High

Ecommerce Conversion Rate and Low Revenue i.e. they are more likely to buy

inexpensive products. The users in red have Low Ecommerce Conversion Rate and Low

Revenue i.e. they are less likely to buy expensive products.

Using this distribution, the company can make decisions on segmented

marketing. As the users are already to buy the product and are researching about it, it is

easier for the company to market in order to increase their ecommerce conversion rate.

From the above cluster analysis, the company can target the segments they prefer and

influence their sales in those segments. For example, users looking for Dating Services

are more likely to make a purchase of diamond jewelry than the ones in Sports &

Fitness. The company can make multiple analysis to target and influence that market

segment.

We now know that Beauty Products & Services, Gift & Occasions, Dating

Services, are the sectors that less likely to buy expensive products. The company can

target this audience and create marketing plans that focus on increasing their

ecommerce conversion rate.

Page 14: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 14 | 26

14

3. Twitter Analysis

We believe that twitter analysis is an important aspect of online marketing for an e-commerce

website such as DiamondStuds.com. Twitter analysis using word cloud helps the company in

understanding the overall user sentiment as well as be aware of their competitors in the market.

Figure 16: Tweet Clouds (1)

The first word cloud is created from the compilation of 35 tweets with “#DiamondStuds”. We

created this word cloud using the color ‘Dark2’ with words having a minimum frequency of 5.

From this we can get the highlighted words/terms like diamond jewelry, mom, gifts, mother’s

day, etc. From this we can view that Mother’s Day is coming up and is a popular topic in

relation to gifting jewelry. These words give us an overview of the general public opinion.

Words like doyleanddoyle and londongold show us the competing brands that are already

popular in relation to diamond studs.

Page 15: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 15 | 26

15

Figure 16: Tweet Clouds (2)

The second word cloud is created from the compilation of 100 tweets with “Diamonds”

in it. We created this word cloud using the color ‘Dark2’ with words having a minimum

frequency of 10. This word cloud helps us analyze the sentiment of users towards diamonds in

general. From this we can get the highlighted words/terms like gemstones, flawless, gianews,

etc. These words give us an overview of the general public opinion and let us know if we need

to look out for anything that would affect our trade in the future.

If the company creates an automated word cloud generator for analyzing the overall

user sentiment, they will know the right action to take at any given moment. For example, if

they want to launch a new product in the market and after Twitter analysis they observe that

the market is very optimistic and positive in relation to diamond studs, they can go ahead with

the launch. But if they observe that the market isn’t doing so well or that not many people are

interested in purchasing diamond studs right now, they can change their tactics to make the

launch more attractive to their customers.

4. Traffic Analysis

Online shopping provides various gateways of payment options with a wide range of

referral sources that offer the products with equal value price. These sources also provide a

good amount of revenue to the parent source. To identify these sources can help to estimate the

market share that they provide to the parent company. Also, these referral sources can be used

as one of the campaign platform to attract more customers.

Page 16: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 16 | 26

16

Figure: 16 displays the list of valued referral sources.

We can also identify the ecommerce conversion rate with the data that show us a

percentage value of view that are converted to transactions.

Figure 17: depicts the e-Conversion Ratio

Page 17: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 17 | 26

17

a) Product suggestions for e-shopping

Online shopping has a unique feature of suggesting shoppers about the related items

that they are looking for. It assists customers with a wide range of unseen products by different

users and provide them information about their visibility and custom price and availability.

To establish this function, an algorithm needs to be created that would categorize

products according to their type and cost. Viewers can see related products viewed by other

users.

Methodology:

Figure: 18 depicts the preliminary rules.

We can also view the number of items that include in those rules. We can mention them

as “Orders”. Figure 18: shows the hierarchical orders of the items.

Page 18: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 18 | 26

18

Figure: 19 Number of Items in the Rules

Finally, with the help of support, lift and confidence parameters, 12 decent rules can be

find out that would help to optimize the products and related ones to display in the front end of

the web application.

Figure 20: Depicts the 12 Association Rules

Page 19: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 19 | 26

19

V. Recommendations - Predictions

Data that are collected from back end of an e-shopping application is mainly used to

study the past data, i.e. Revenue acquired, Quantity of products sold, transaction conversion

rate as well as to identify the products that are not receiving proper exposure to the customers,

thereby downsizing with respect to Sales, consequently Revenue.

These kind of Products can be identified as mentioned in the Figure: 5 and can be

predicted w.r.t to Quantity to be sold and corresponding Revenue. Figure 15 depicts the rules

that can be used to identify the process of the sell workflow of the Products and their

possibilities of providing Revenue.

Figure 21: Product Quantity - Revenue - Prediction

VI. Conclusion

R provided valuable insights from the acquired data ranging from identifying Product

performance, Target market section, referral traffic sources as well as social trends that can help

the online retailer with a data proven strategy and provide predicting data for future campaigns

and product placement in the market. Also, this data would estimate the possibility of

capturing a market share when new products are being launched.

Page 20: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 20 | 26

20

VII. References DeBois, P. (2015, July 31,). How to Shrink Big Data To Fit Your Marketing Strategy. Retrieved from

cmswire: http://www.cmswire.com/analytics/how-to-shrink-big-data-to-fit-your-marketing-

strategy/?utm_source=MainRSSFeed&utm_medium=Web&utm_campaign=RSS-News

DiamondStuds. (n.d.). Diamond online store. Retrieved from diamondstuds:

https://www.diamondstuds.com/

Forbes. (2013, July 22,). Big Data, Analytics And The Future Of Marketing And Sales. Retrieved from

Forbes: http://www.forbes.com/sites/mckinsey/2013/07/22/big-data-analytics-and-the-future-

of-marketing-sales/#3afb7b52344d

VIII. Appendices: R-Code

Figure 2: Revenue Vs Sales Period

boxplot(Revenue~Date,productdata,main="Revenue VS Sales Date", xlab="Date",ylab="Revenue",

Vertical=TRUE,col=terrain.colors(10))

Figure 4: Identifying the top Product list:

myvars <-subset(productdata, Product.Revenue >5000, select=c(Product.Revenue,Product))

str(myvars)

View(myvars)

Figure 5: Identifying the bottom list:

aggregate(Revenue~Product, productdata, mean)

boxplot(Revenue~Product,productdata)

productprice <- subset(productdata,Product=="Product" |

Revenue=="Product.Revenue",select=c(Product,Revenue,Quantity))

View(myvars)

Figure 7: Scatter Plot Sessions vs Revenue

#Scatter plot with labels for points

plot(Revenue~Sessions, data = Interests)

with(Interests,text(Revenue~Sessions, labels=Affinity.Category,pos=3, cex=0.5))

Page 21: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 21 | 26

21

#Normalize & Calculate

# Normalize

> z = Interests[,-c(1)]

> m <- apply(z,2,mean)

> s <- apply(z,2,sd)

> z <- scale(z,center=m,scale=s)

#calculate distance matrix (default is Euclidean distance)

> distance <- dist(z)

> print(distance, digits = 2)

Figure 9: Cluster Dendrogram (average linkage)

#Cluster Dendrogram

#Cluster Dendrogram (complete linkage)

hc.c <- hclust(distance)

plot(hc.c,hang=-1,labels=Interests$Affinity.Category)

#Cluster Dendrogram (average linkage)

hc.a<-hclust(distance,method="average")

plot(hc.a,hang=-1,labels=Interests$Affinity.Category)

Figure 10: Silhouette Plot

#Characterizing clusters

#Create a vector showing the cluster membership

> member.c = cutree(hc.c,3)

> table(member.c)

> member.a = cutree(hc.a,3)

> table(member.a)

> table(member.c,member.a)

member.a

member.c 1 2 3

Page 22: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 22 | 26

22

1 15 0 0

2 10 1 0

3 0 5 1

> aggregate(z,list(member.a),mean)

> aggregate(Interests[,-c(1)],list(member.a),mean)

> library(cluster)

> plot(silhouette(cutree(hc.a,3), distance))

#K-means Clustering

> kc<-kmeans(z,3)

> kc

K-means clustering with 3 clusters of sizes 12, 14, 6

Cluster means:

Sessions X..New.Sessions New.Users Pages...Session Transactions

1 -0.3428882 -0.7494206 -0.3540434 -0.3589429 -0.2811070

2 -0.4785232 0.6950469 -0.4694847 -0.3339416 -0.5513192

3 1.8023304 -0.1229350 1.8035511 1.4970830 1.8486254

Within cluster sum of squares by cluster:

[1] 25.38145 35.21799 27.20816

(between_SS / total_SS = 68.5 %)

Figure 11: Scatter Plot of Clusters - Revenue vs Ecommerce Conversion Rate

plot(Revenue~ Ecommerce.Conversion.Rate, Interests,col = kc$cluster)

with(Interests,text(Revenue~ Ecommerce.Conversion.Rate, labels=Affinity.Category,pos=3, cex=0.5))

Figure 12: Scatter Plot Sessions Vs Revenue

# Normalize

> z = Interests[,-c(1)]

> m <- apply(z,2,mean)

> s <- apply(z,2,sd)

> z <- scale(z,center=m,scale=s)

Page 23: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 23 | 26

23

#calculate distance matrix (default is Euclidean distance)

> distance <- dist(z)

> print(distance, digits = 2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

2 2.9

3 4.8 7.1

4 5.1 7.6 2.4

5 4.8 7.1 3.8 4.0

6 6.4 8.5 6.2 5.3 2.9

7 3.4 5.7 4.1 4.1 1.6 3.2

8 4.8 7.2 1.6 3.2 2.7 5.4 3.4

9 5.0 7.2 3.0 3.3 1.8 4.0 2.8 2.1

10 4.6 6.7 1.5 3.7 4.8 7.4 4.9 2.4 4.0

11 3.2 5.5 2.6 3.8 2.4 5.0 2.2 1.9 2.6 2.9

12 3.2 5.6 1.7 3.1 3.4 5.9 3.2 1.9 3.0 1.8 1.4

13 1.9 1.4 6.4 6.7 6.0 7.3 4.6 6.4 6.3 6.2 4.7 4.8

14 2.0 4.5 3.5 3.4 3.3 4.9 2.1 3.5 3.4 3.9 2.2 2.3 3.5

15 6.1 8.2 5.3 5.2 1.6 2.2 2.8 4.0 2.7 6.3 3.9 5.0 7.1 4.7

16 4.7 7.2 1.9 2.4 2.0 4.4 2.7 1.2 1.6 3.1 2.1 2.2 6.3 3.1 3.4

17 5.3 7.8 3.3 3.4 1.1 3.2 2.4 2.2 1.9 4.5 2.6 3.3 6.7 3.7 2.2 1.4

18 2.8 4.7 3.3 3.9 3.0 5.2 2.5 3.0 2.5 3.6 1.9 2.1 3.9 1.8 4.3 2.9 3.5

Figure 13: Cluster Dendrogram (average linkage)

#Cluster Dendrogram (complete linkage)

hc.c <- hclust(distance)

plot(hc.c,hang=-1,labels=Interests$In.Market.Segment)

#Cluster Dendrogram (average linkage)

hc.a<-hclust(distance,method="average")

plot(hc.a,hang=-1,labels=Interests$In.Market.Segment)

#Characterizing clusters

#Create a vector showing the cluster membership

> member.c = cutree(hc.c,3)

> table(member.c)

member.c

1 2 3

3 13 2

> member.a = cutree(hc.a,3)

> table(member.a)

member.a

1 2 3

3 10 5

> table(member.c,member.a)

Page 24: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 24 | 26

24

member.a

member.c 1 2 3

1 3 0 0

2 0 10 3

3 0 0 2

> aggregate(z,list(member.a),mean)

Group.1 Sessions X..New.Sessions New.Users Pages...Session Transactions

1 1 1.9459877 0.09052571 1.9377519 -0.4328091 1.8769426

2 2 -0.2173128 0.44471812 -0.2116305 -0.3503525 -0.2554043

3 3 -0.7329671 -0.94375167 -0.7393901 0.9603904 -0.6153569

Revenue Ecommerce.Conversion.Rate Quantity Avg..Session.Duration

1 1.8845803 -0.3757202 1.8852053 -0.4663532

2 -0.2424465 -0.4660330 -0.2565385 -0.4090817

3 -0.6458552 1.1574981 -0.6180462 1.0979754

aggregate(Interests[,-c(1)],list(member.a),mean)

Group.1 Sessions X..New.Sessions New.Users Pages...Session Transactions

1 1 64886.33 0.6740667 43200.33 3.703730 851.0

2 2 18791.40 0.6833700 12620.40 3.725137 235.5

3 3 7804.00 0.6469000 5111.80 4.065426 131.6

Revenue Ecommerce.Conversion.Rate Quantity Avg..Session.Duration

1 737293.1 0.01190 897.0 161.3333

2 215153.0 0.01147 245.9 162.5000

3 116124.8 0.01920 136.0 193.2000

> library(cluster)

> plot(silhouette(cutree(hc.a,3), distance))

> kc<-kmeans(z,3)

> kc

K-means clustering with 3 clusters of sizes 9, 6, 3

Cluster means:

Sessions X..New.Sessions New.Users Pages...Session Transactions

1 -0.2562221 0.62691169 -0.2458183 -0.4638935 -0.3025588

2 -0.5886608 -0.98563039 -0.6001485 0.9122448 -0.4846330

3 1.9459877 0.09052571 1.9377519 -0.4328091 1.8769426

Revenue Ecommerce.Conversion.Rate Quantity Avg..Session.Duration

1 -0.2919728 -0.5577461 -0.3018960 -0.5372607

2 -0.5043310 1.0244792 -0.4897586 1.0390676

3 1.8845803 -0.3757202 1.8852053 -0.4663532

Clustering vector:

[1] 3 3 1 1 2 2 2 1 2 1 1 1 3 1 2 1 2 1

Page 25: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 25 | 26

25

Within cluster sum of squares by cluster:

[1] 29.339950 16.134436 4.722658

(between_SS / total_SS = 67.2 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"

Figure: 15 Scatter Plot

plot(Revenue~ Ecommerce.Conversion.Rate, Interests,col = kc$cluster)

with(Interests,text(Revenue~ Ecommerce.Conversion.Rate, labels=In.Market.Segment,pos=3, cex=0.5))

Figure 16: Tweet Clouds (1 & 2)

tweets<- searchTwitter('#diamondstuds',n=100, lang='en')

tweets

tweets<- searchTwitter('diamonds',n=100, lang='en')

tweets

library(wordcloud)

m<- as.matrix(tdm)

wordFreq <- sort(rowSums(m), decreasing=TRUE)

set.seed(1000)

wordcloud(words=names(wordFreq), freq=wordFreq, min.freq=5, random.order=F,colors=brewer.pal(6,

"Dark2"))

wordcloud(words=names(wordFreq), freq=wordFreq, min.freq=10, random.order=F,colors=brewer.pal(6,

"Dark2"))

Figure 17: depicts the e-Conversion Ratio

library(scatterplot3d)

myvars <- subset(trafficdata,Transactions

>10,select=c(Source,Ecommerce.Conversion.Rate,Average.Order.Value))

str(myvars)

Figure: 18 depicts the preliminary rules.

Page 26: Shrinking big data for real time marketing strategy - A statistical Report

P a g e 26 | 26

26

scatterplot3d(myvars$Source,myvars$Ecommerce.Conversion.Rate,myvars$Average.Order.Value,main="

Ecommerce Conversion via Referral sources", xlab="Source ", ylab="e-Conversion Rate ",pch=19,

highlight.3d=TRUE,type="h")

Figure: 19 Number of Items in the Rules

plot(rules, shading="order", control=list(main = "Number of Items in the Rules"))

Figure 20: Depicts the 12 Association Rules

rules<-apriori(mydata,parameter = list(minlen=1,maxlen=5,supp=.7))

plot(rules)

inspect(rules)

plot(rules, shading="order", control=list(main = "12 Association Rules"))

Figure 21: Prediction

myvars <-subset(productdata, Revenue < 5000, select=c(Revenue,Product,Quantity))

str(myvars)

predictdata <- myvars

library(party)

mytree <- ctree(Quantity~Revenue+Quantity,predictdata)

plot(mytree,type="simple",main="Product, Quantity VS Revenue Prediction(Bottom list)")

End