customer behaviour modelling - tech presentation

Post on 26-Jan-2017

152 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Customer Behaviour ModellingInsights from customer data

RedbubbleAnuj Luthra

Product Development

● Millions of users● Lots of ideas● Lots of unquantified & unvalidated assumptions● What are the Biggest problems● What should we pursue first = best opportunity● We want to build the right thing

Existing Techniques● User interviews and surveys

○ Interpretation of wants and needs is tricky■ Not dependable

○ Expensive & time consuming● Analytic tools (GoogleAnalytics, Flurry) provide high

level views ○ difficult to gauge effect of each variable on its own :

Lots of factors at play, how much did a singular thing affect the outcome

What is lacking● Ability to get insights from real user actions/visits● Make it Quick and Cheap to support/reject assumptions● Confidence, like probabilities, external factors and stuff :

-)

What we do

● Statistical modelling of customer data and infer● Quantification of relative impact of the user behaviours

and visit attributes“Lets put some science in data analysis”

● Give a starting point● Define the goal for measuring success● Keeps you focussed and honest● Hunches are powerful - use domain knowledge

Strongest Hypotheses

Identify hypotheses

○ HypothesisA: “Users jumping along & looking at multiple search result pages are having a bad experience”

○ HypothesisB: “Users navigating to a listing from search results are having a good experience”

○ HypothesisC: “Users typing in keywords in search box multiple times are not having a good experience”

Measurable User Journeys

● Identify particular user journeys in a visit○ hypothesisA: SPPPSPSP○ hypothesisB: SLL

● Journeys don’t need to be exclusive - they are not!● Lots of log parsing, mapreduce● Usually the process varies for each business

Data Preparation

● Start with a small sample size● Focus more on quality● Look out for anomalies & outliers● Remove correlated variables - noise

Data Visualization

● Visualize your data○ Simple Histogram will tell you a lot of things○ Scatter plots are good for identifying outliers

Regression analysis

● Statistical process for estimating the relationships among variables

● Choice of method largely depends of the form of data and variable types

● Linear regression is your go-to method for initial pokes● Poisson or logit model are also very useful tools for

most ecommerce related datasets

Example (Using R)

Independent Variables Estimate Std. Error z value Pr(>z) Significance

clickThroughToListings 0.34065 0.12654 2.692 0.00710**

pagingAroundSearchResults -0.28925 0.08688 -3.329 0.00087***

usingSearchBoxTooMuch 0.12038 0.12608 0.955 0.33967

glm(formula = addToCart ~ clickThroughToListings + pagingAroundSearchResults + usingSearchBoxTooMuch, family = "binomial", data = summary.df

)

Independent Variables Estimate Std. Error z value Pr(>z) Significance

clickThroughToListings 0.34065 0.12654 2.692 0.00710**

pagingAroundSearchResults -0.28925 0.08688 -3.329 0.00087***

usingSearchBoxTooMuch 0.12038 0.12608 0.955 0.33967

How to interpret signalDirection

How to interpret signal

Independent Variables Estimate Std. Error z value Pr(>z) Significance

clickThroughToListings 0.34065 0.12654 2.692 0.00710**

pagingAroundSearchResults -0.28925 0.08688 -3.329 0.00087***

usingSearchBoxTooMuch 0.12038 0.12608 0.955 0.33967

Significance

Concrete Direction

● Now we know which user segments present a real opportunity to make improvements

● How big is the customer segment = problem size● Knowing problem size helps in prioritizing

SummaryMethodology:1. Gather Strongest Hypotheses2. Construct Measurable User Journeys3. Choose & Apply statistical methods4. Support/reject hypothesis5. Repeat-Refine

Toolkit

● BigQuery: Query parsing, mapreduce● R: Data visualization, cleaning, augmentation, statistical

methods● Ruby: Scripting● Coffee: ‘Cos

Found it interesting?

Come and talk to us

top related