cluster analysis - keyword clustering

10
KEYWORD CLUSTERING Understanding search behavior using R and Tableau

Upload: justine-jes-thomas

Post on 17-Jan-2017

158 views

Category:

Marketing


3 download

TRANSCRIPT

Page 1: Cluster Analysis -  Keyword Clustering

KEYWORD CLUSTERING

Understanding search behavior using R and Tableau

Page 2: Cluster Analysis -  Keyword Clustering

Introduction

■ Why is keyword clustering important?– To understand what your visitors are trying to accomplish– To identify the profitable keywords for the website– To group the keywords into logical groups, such that the work towards

one positively impacts the results of another

■ Challenges?– Google has made it difficult to analyze search keywords over the past

years (due to their passing of “(not provided)” instead of the actual keywords)

Page 3: Cluster Analysis -  Keyword Clustering

Concept: K-Means Clustering/Unsupervised Learning

■ Unsupervised: trying to understand the structure of our underlying data, rather than trying to optimize for a specific, pre-labeled criterion

– No assumptions on data (contrast with pre-defined relationships such as visitors from mobile or visitors from referral)

■ k-means clustering: method of partitioning data into ‘k’ subsets, where each data element is assigned to the closest cluster based on the distance of the data element from the center of the cluster.

Page 4: Cluster Analysis -  Keyword Clustering

Converting Text to Numeric Data

■ In order to use k-means clustering with text data, text-to-numeric transformation is done

■ R has packages to convert text to numeric data (RSiteCatalyst, RTextTools, Document term matrix)

■ In the DTM, each row is a search term and each column is a 1/0 representation of whether a single word is contained within natural search term.

Page 5: Cluster Analysis -  Keyword Clustering

Keyword Augmentation

■ stemWords reduces a word down to its root, which is a standardization method to avoid having multiple versions of words referring to the same concept (e.g. argue, arguing, argued reduces to  ’argu’)

■ removeStopwords eliminates common English words such as “they”, “he” , “always”

■ minWordLength sets the minimum number of characters that constitutes a ‘word’, which is set to 1

■ removePunctuation removes periods, commas, etc.

Page 6: Cluster Analysis -  Keyword Clustering

Inspecting Common Elements

Page 7: Cluster Analysis -  Keyword Clustering

Guessing at ‘k’: A First Run at Clustering■ One downside to using k-means clustering as a technique is that the user

must choose ‘k’, the number of clusters expected from the dataset

– K can be chosen manually, by guessing (but requires reclustering till all keywords are clustered)

– K can be chosen using elbow method

Page 8: Cluster Analysis -  Keyword Clustering

Elbow method: Finding breakpoints in our cost plot

After the slope becomes flat, each additional cluster becomes less effective at reducing the distance from the each data center.

So while single ‘best’ value of ‘k’ is not determind, the range of values for ‘k’ to evaluate has been determined

Page 9: Cluster Analysis -  Keyword Clustering

Output from clustering activity Naming the clusters and tagging as per the theme

Page 10: Cluster Analysis -  Keyword Clustering

Tableau Report Snapshot