big data @ hippo - gettogether 2014

33
follow the Hippo trail Hippo GetTogether 2014 Big Data @ Hippo Hippo GetTogether 2014 - Trouw Frank van Lankvelt follow the Hippo trail

Upload: frank-van-lankvelt

Post on 28-Jan-2018

307 views

Category:

Technology


1 download

TRANSCRIPT

follow the Hippo trail

Hippo GetTogether 2014

Big Data @ Hippo

Hippo GetTogether 2014 - Trouw

Frank van Lankvelt

follow the Hippo trail

follow the Hippo trail

Hippo GetTogether 2014

Co-occurrence

Relating Attributes

follow the Hippo trail

Hippo GetTogether 2014

Scary Math

follow the Hippo trail

Hippo GetTogether 2014

Contingency TableA not A

B x 20 - x 20

not B 40 - x 140 + x 180

40 160 200

Documents A, B

total # visitors

visitors of B

visitors of A

x P(x >= 8) ≈ 3%

visitors of A & B

follow the Hippo trail

Hippo GetTogether 2014

Co-occurrence InsightsInsight: a high cohesion of page visits in the partner section

standing out from the regular ‘.com’ visitor cluster suggests that visitors looking for a partner go through every single page and probably can’t find what they’re looking for.

Action: Hippo suggests to improve navigation, search or filtering.

● attribute / url

relatedness

find partner

/fr

.com.orggenericrelease

notes

follow the Hippo trail

Hippo GetTogether 2014

Recommendations

Alice Bob Charlie

Star Wars 3 4

Finding

Nemo3 4

Sound of

Music5 1 2

genre stars

Star Wars sci-fi Portman

Finding

Nemoanimation DeGeneres

Sound of

Musicmusical Andrews

user - item (rating)

collaborative filtering

content

(meta) data

which documents are interesting for ME?

find docs similar to visited documents find docs co-occurring with visited documents

follow the Hippo trail

Hippo GetTogether 2014

Implementation

combine in search index:

Recommendation Query

Content-based:

(meta) data

Collaborative Filtering:

co-occurrence

follow the Hippo trail

Hippo GetTogether 2014

follow the Hippo trail

Hippo GetTogether 2014

Recommended For You

1.Collect ID of viewed content

2.Calculate co-occurrences

3.Index, along with contentIDs of co-viewed documents

4.Search with recent IDs, similarity

5.Repeat with other collected data

follow the Hippo trail

Hippo GetTogether 2014

Patterns

Beyond Co-occurrence

follow the Hippo trail

Hippo GetTogether 2014

Patterns in the Data

customers that buy diapers often buy beer as well

(young dads rewarding themselves?)

follow the Hippo trail

Hippo GetTogether 2014

Itemsets Rules

Find the patterns (association rule mining):1.sets of items that are bought togetherP(beer,diapers) > 1%(support)

1.subsets that are good predictors

> 4 (lift)P(beer,diapers)P(beer) P(diapers)

follow the Hippo trail

Hippo GetTogether 2014

http://www.onehippo.com/en/thankyou - Thank You

Beer? Diapers? Conversions!!!

follow the Hippo trail

Hippo GetTogether 2014

http://www.onehippo.com/en/thankyou

will a visitor go there?P(conversion|request log)

what are the relevant “signals”?

which configuration performs best?

follow the Hippo trail

Hippo GetTogether 2014

Patterns For Conversion

single item:referrer www.google.com

pattern/itemset:visited demo2014 week 4

correlations

follow the Hippo trail

Hippo GetTogether 2014

Scary Data Structure

follow the Hippo trail

Hippo GetTogether 2014

1.Build Frequent Prefix Tree(FPGrowth)

2.Extract patterns relevant for conversion(using contingencies)

Finding Frequent Itemsets

follow the Hippo trail

Hippo GetTogether 2014

Pattern Contingency Table

converted not converted

patternmatches

pattern does not

match

converted● visited /thankyou

sample pattern● visited demo● in 2014 week 4

follow the Hippo trail

Hippo GetTogether 2014

Sub-Pattern Filtering

Problem:when pattern (A, B, C) is relevant, patterns

(A), (B), (C), (A, B), (A, C), (B, C)(likely) also match. E.g. with C meta-data on page B.

Solution:test for independence using contingency!

follow the Hippo trail

Hippo GetTogether 2014

Actionable Insights?

The found itemsets are quite numerous and seem to contain a lot of redundancy.

But they are certainly interesting, e.g. for a periodic evaluation.

follow the Hippo trail

Hippo GetTogether 2014

Personalization

Putting Patterns to Use

follow the Hippo trail

Hippo GetTogether 2014

Naive A/B Testing

The naive solution:route some traffic to alternative configuration

A (old config): 80%B (new config): 20%

run for some time

see if B has relatively more conversions

follow the Hippo trail

Hippo GetTogether 2014

Problems With Naive Solution

if B is drastically worse,20% of traffic is LOST

marketer must regularly check and decidewhen has a new config PROVEN itself?

number of concurrent experiments is LOW

no user context

follow the Hippo trail

Hippo GetTogether 2014

Scary Math

follow the Hippo trail

Hippo GetTogether 2014

Predict Conversion

Conversion rate depends on context:

x the patterns

w the “weights”

ϕ cdf of normal dist.

follow the Hippo trail

Hippo GetTogether 2014

Experimental Setup

Split data set (.org + .com)

1.training set189660 visitors, 435 conversions

2.test set27013 visitors, 40 conversions

follow the Hippo trail

Hippo GetTogether 2014

Can We Predict Conversion?

1260 itemsets

ROC curveTPR versus FPR

@ false positive rate 10%: 96% true positive rate

follow the Hippo trail

Hippo GetTogether 2014

Towards Actionable Insights

UseA utomaticR elevanceD etermination

to prune the patterns(optimize the prior)

σ

μ

relevant

irrelevant

weights (w)

follow the Hippo trail

Hippo GetTogether 2014

Top 20 Patterns For Conversionreferer.go.onehippo.com.pathInfo./resources/whitepapers/forrester-market-overview-web-content-management-systems.html.pathInfo./resources/whitepapers/cms---a-critical-solution-for-todays-ecommerce.html.pathInfo./resources/whitepapers/hippo-cms-for-the-enterprise.html.pathInfo./resources/whitepapers/web-content-management-in-the-cloud.html.collectorData.channel.One Hippo English Site .collectorData.audience.terms. referer.www.onehippo.com.collectorData.categories.terms.cms .pathInfo./mobile-cms.collectorData.channel.One Hippo English Site .pathInfo./ressourcen/demo.pathInfo./resources/videos/hippo-cms-grand-tour.html.collectorData.channel.One Hippo English Site .collectorData.audience.terms. .collectorData.categories.terms.cms

.pathInfo./ressources/demo

.pathInfo./what_to_buy/compare.htmlreferer.www.cmswire.com.pathInfo./resources/demo .collectorData.categories.terms.mobile.pathInfo./resources/whitepapers/understanding-hippo-cms-7-software-architecture.html.pathInfo./resources/whitepapers/selecting-today’s-enterprise-web-content-management-system.html.collectorData.channel.One Hippo English Site referer.www.google.nlreferer.www.onehippo.com .pathInfo./resources/videos/a-quick-overview-of-hippo-cms-in-just-under-3-minutes.html.collectorData.categories.terms.repository .pathInfo./resources/whitepapers/selecting-today’s-enterprise-web-content-management-system.html.collectorData.categories.terms. .collectorData.categories.terms.relevance

follow the Hippo trail

Hippo GetTogether 2014

Actionable Insights!

we can find asmall model

that can be used forhuman interpretation

andautomated personalization

follow the Hippo trail

Hippo GetTogether 2014

Product Challenge

KISS# parameters should be minimal

follow the Hippo trail

Hippo GetTogether 2014

Parameters

Recommendations1 hyper-param

Personalizationidem

NICE!

follow the Hippo trail

Hippo GetTogether 2014

Questions?