snowplow at de bijenkorf · introduction why do event tracking? architecture use cases lessons...

25
Snowplow at de Bijenkorf

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

Snowplow at de Bijenkorf

Page 2: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

2Agenda

Introduction

Why do event tracking?

Architecture

Use cases

Lessons learned

Questions

Page 3: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

3Introduction

Niels Reijmer

‒ Project role:

‒ Data analyst

Andrei Scorus

‒ Project role:

‒ Main ETL developer

De Bijenkorf

145 year old high-end department stores

New course: closing 5 out of 12 stores

Focus on premium personal service!

Page 4: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

4Why do event tracking?

Stay ahead of the market

Adds flexibility to the analysis possibilities

Makes advanced analysis possible

Focus on premium service also online: recommendations

Page 5: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

5What is Snowplow?

The Snowplow enrichment process processes raw events from a collector and

Cleans up the data into a format that is easier to parse / analyse

Enriches the data (e.g. infers the location of the visitor from his / her IP address)

Stores the cleaned, enriched data

Page 6: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

6Architectural overview

AWS‒ S3

‒ Kinesis

‒ Elasticache

‒ Elastic Beanstalk

‒ EC2

‒ DynamoDB

Open Source‒ Snowplow Event Tracker

‒ Rundeck Scheduler

‒ Jenkins Continuous

Integration

‒ Pentaho PDI

Other‒ HP Vertica

‒ Tableau

‒ Github

‒ RStudio Server

Page 7: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

7Snowplow implementation

1. Tracker provided by Snowplow and fired using Google

Tag Manager

2. Collector in Elastic Beanstalk

3. Enricher in Kinesis

4. Storage in the Vertica environment

5. Data modeling

6. Analytics in R, Tableau, etc.

Page 8: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

8Uses

Reporting

A/B test analysis

Personalisation on the website

Advanced analysis (next talk)

Page 9: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

9Snowplow table

Page 10: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

10Example of the data

Not very useful this way

Page 11: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

11Data and dashboards

Why snowplow?

‒ Data can be combined in many ways due to the granularity

‒ Any question can be answered

We do not rebuild Google analytics

Page 12: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

12Example of a dashboard

Page 13: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

13A/B test data

CR: 2.0%

CR: 2.4% +20%

Page 14: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

14A/B test data

CR: 2.0%

CR: 2.02% +1%

Further analysis in snowplow to determine what happened

Page 15: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

15

Recommendations

Page 16: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

16Overview

Something to recommend: 50.000 products

Ability to make models: R, python, sql

Historical data on each individual user

Recognize users over sessions

Page 17: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

17Data availability

Aggregated Individual data

Limited rows per user

Major cold-start problem

Individual data

A lot of rows per user

Minor cold-start problem

Page 18: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

18What data for example

Page 19: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

19Recognize the user in multiple sessions

For historical data processing:

‒ Snowplow has two user identifiers:

‒ User_id is entered on (soft) login

The challenge is to link the domain_userid to the userid

Page 20: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

20Recognize the user

Yali S.

2 weeks later

Page 21: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

21Identify user from email

2 weeks later

Newsletter

with id

Yali S.

Email ID

xyz123

User_id

qWier586_kasd==

Page 22: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

22Recognize user during the session

We can link the domain_user to the userid

But we need to have a user ID to use for the API call:

User is known from their account login

User is already known from their snowplow id

User is unknown

‒ Nothing…..?

Page 23: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

23Recommender dataflow in detailUser is known from

their account login

User is already

known from their

snowplow id

User is unknown

‒ Nothing…..?

Page 24: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

24Lessons learned

Implementation

Fairly easy to start, but highly customizable

Documentation of your implementation should be in sync

with the actual implementation.

Reporting, A/B testing

More insights possible, adds flexibility

Need additional tools/resources to make it usable for

non-technical people

Google Analytics and Snowplow can complement each

other

Recommender

Just start and keep it simple

Page 25: Snowplow at de Bijenkorf · Introduction Why do event tracking? Architecture Use cases Lessons learned Questions. Introduction 3 ... ‒ HP Vertica ‒ Tableau ‒ Github ‒ RStudio

25Questions?