streaming predictions of user behavior in real-time

Post on 14-Feb-2016

41 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Streaming Predictions of User Behavior in Real-Time. Ethan Dereszynski ( Webtrends ) Eric Butler ( Cedexis ) OSCON 2014. How come you never see a headline like "Psychic Wins Lottery"? Jay Leno. Enabling Interesting Predictions: Leverage Streaming Data. Streams Data. websockets. - PowerPoint PPT Presentation

TRANSCRIPT

Streaming Predictions of User Behavior in Real-Time

Ethan Dereszynski (Webtrends)

Eric Butler (Cedexis)

OSCON 2014

How come you never see a headline like "Psychic Wins Lottery"?

Jay Leno

Enabling Interesting Predictions:

Leverage Streaming Data

Streams Data

websockets

Streams Data

websockets1 second

The best way to predict the future is to invent it.

Alan Kay

Session Data Each user “click” triggers a event Event information captured by embedded tag

Session Data A session is a string of events that all correspond to a single “visit” to a web site.

Event 1 Event 2

Session Data A session end when a visitor leaves the site, closes the browser, or goes idle for 30 minutes

Event 1 Event 2 Event 3

Learning from Streaming Data Sessions provide examples of visit behavior Not all sessions are equally likely

- Many paths are rarely, if ever, taken- Frequent paths suggest common ways visitors behave on a given site

Learning Models of Visitor Behavior- Predict future actions- Provides a rich, new feature to identify/segment users

- Identify users who have a common trajectory, or subtrajectory, through the web site- More than just a label

- Behavior tells us something about how users achieve a goal on a web site

Event Data JSON containing parameter/value

pairs Describes content of page

(triggered by event) Contains geo, device, referrer, etc. 50-100 parameters per page (event)

Challenges of Real Data How do we describe each event?

- Number of parameters per event can be large- Space of possible “events” is massive

Not all parameters are relevant to the user’s actions

Client 1 Client 2Num

ber of events

About Topics Models Each topic is a distribution over all words in the dictionary Each document is generated by a mixture of topics

D. Blei.   Probabilistic topic models.   Communications of the ACM, 55(4):77–84, 2012.

Abstraction Layer: Global/Local Topic – Latent Dirichlet Allocation (GLT-LDA) Topic modeling technique for document clustering

- Documents assigned to a single topic (instead of a mixture)- Global “Noise” topic explains redundant parameters

Clusters parameters into topics

:

:::

,

,

ji

ji

G

k

x

w Distribution over parameter for

topic k

Distribution over noise parameters

jth parameter in event i

Noise-indicator for jth parameter in event i

:::

i

i

z Topic distribution

Noise rate for document iTopic label for document i

BetaBinomialx

lMultinomiawzDirichlet

ji

jiiKG

~ ~

~, ~,,

,

,

The Dataset Collection of visitor traces, varying length

…Event 1 Event 2 Event t

Visitor 1

Visitor 2

Visitor n

Representing Behavior: Two Approaches Enumerate the space of all possible paths and count

- This is would require a very big table.- Most of the entries would be 0.- Not clear how to handle variable length visits

Hidden Markov Model (HMM)- Encodes visitor behavior in a probabilistic model - Calculates likelihood (or probability) of specific trajectories- Enables prediction of future actions a visitor may take on the site

The Hidden Markov Model Site visit (emission) probabilities:

Stochastic state transitions:

0 1( | ) ( , ,..., )t t j j jMP A S j Multinomial

),...,,()|( 101 j

Kjjtt lMultinomiajSSP

0S 1S … tS

0A 1A tA

ObservedHidden

The Hidden Markov Model

Viewing Products

Product Comparison

Make Purchase

.6

.4

Visitors arrive at a site with an intention- The current intention specifies the probability they will take some action (trigger an event)- After the page is selected, the intention transitions to a new value (could be the same as

the previous intention)

.7 .3

The Hidden Markov Model

Viewing Products

.7 .3 .7 .3

Product Comparison

Visitors arrive at a site with an intention- The current intention specifies the probability they will take some action (trigger an event)- After the page is selected, the intention transitions to a new value (could be the same as

the previous intention)

.15 .85

Make Purchase

Predictive Model: Learning and Runtime Offline:

- Session data is recorded into batch file for training- Trained with expectation maximization (EM) algorithm

Online : - The model used to predict specific visitor actions

- CartAdd (add an item to the shopping cart)- Purchase (complete the purchase funnel)

- Conditions predictions on observed actions the visitor has taken so far- Update predictions each time a new action is taken by the visitor.- Can be generalized to other predictive queries

Online Inference Goal: Compute the probability that actions t+1 to t+5 contain at least a single purchase /

cartAdd.

t t+1 t+2 t+3 t+4 t+5

act. act. act. act. act. act.

state state state state state state

Online Inference Goal: Compute the probability that actions t+1 to t+5 contain at least a single purchase /

cartAdd.

t t+1 t+2 t+3 t+4 t+5

act. act. act. act. act. act.

state state state state state state

Prediction window

Sequence Time Action

t = 0 ?t = 1 ?t = 2 ?

t = 3 ?

t = 4 ?

Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Adt = 1 19:38:52.571Z ListViewt = 2 19:39:01.941Z ProductView

t = 3 ?

t = 4 ?t = 5 ?t = 6 ?t = 7 ?

Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Adt = 1 19:38:52.571Z ListViewt = 2 19:39:01.941Z ProductView

t = 3 19:39:15.467Z Link

t = 4 19:43:08.296Z Linkt = 5 19:50:23.952Z ProductView

t = 6 ?

t = 7 ?

t = 8 ?t = 9 ?t = 10 ?

Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Adt = 1 19:38:52.571Z ListViewt = 2 19:39:01.941Z ProductView

t = 3 19:39:15.467Z Link

t = 4 19:43:08.296Z Linkt = 5 19:50:23.952Z ProductViewt = 6 19:50:47.646Z AddedToCart

t = 7 ?

t = 8 ?

t = 9 ?t = 10 ?t = 11 ?

Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Adt = 1 19:38:52.571Z ListViewt = 2 19:39:01.941Z ProductView

t = 3 19:39:15.467Z Link

t = 4 19:43:08.296Z Linkt = 5 19:50:23.952Z ProductViewt = 6 19:50:47.646Z AddedToCartt = 7 19:51:01.273Z ProductView

t = 8 19:51:11.691Z Link

t = 9 19:51:20.499Z Link

t = 10 ?

t = 11 ?t = 12 ?t = 13 ?t = 14 ?

Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Adt = 1 19:38:52.571Z ListViewt = 2 19:39:01.941Z ProductView

t = 3 19:39:15.467Z Link

t = 4 19:43:08.296Z Linkt = 5 19:50:23.952Z ProductViewt = 6 19:50:47.646Z AddedToCartt = 7 19:51:01.273Z ProductView

t = 8 19:51:11.691Z Link

t = 9 19:51:20.499Z Linkt = 10 19:51:27.320Z ListViewt = 11 19:51:47.992Z ProductViewt = 12 19:52:04.216Z ListViewt = 13 19:52:11.398Z ProductView

t = 14 19:52:20.873Z Link

t = 15 ?

t = 16 ?t = 17 ?t = 18 ?t = 19 ?

Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Ad

t = 1 19:38:52.571Z ListView

t = 2 19:39:01.941Z ProductView

t = 3 19:39:15.467Z Link

t = 4 19:43:08.296Z Link

t = 5 19:50:23.952Z ProductView

t = 6 19:50:47.646Z AddedToCart

t = 7 19:51:01.273Z ProductView

t = 8 19:51:11.691Z Link

t = 9 19:51:20.499Z Link

t = 10 19:51:27.320Z ListView

t = 11 19:51:47.992Z ProductView

t = 12 19:52:04.216Z ListView

t = 13 19:52:11.398Z ProductView

t = 14 19:52:20.873Z Link

t = 15 19:54:18.080Z ViewedCart

t = 16 19:55:32.557Z StartCheckout

t = 17 19:57:13.246Z CompletedPurchase

t = 18 19:57:39.698Z ConfirmCheckout

t = 19-24 ?

Streams Data

websockets

Prediction Bolt

Prediction Architecture:

Validation Bolt

Validates raw events from Kafka

Augments events with prediction values and confidence labels

Prediction Bolt

Event Stream Bolt Session Stream Bolt

Prediction Architecture:

Validation Bolt

Validates raw events from Kafka

Augments events with prediction values and confidence labels

Dispatches individual events to Streams

Dispatches full sessions to Streams

websockets

Prediction Bolt ROC Bolt

Event Stream Bolt Session Stream Bolt

Prediction Architecture:

Validation Bolt

Validates raw events from Kafka

Augments events with prediction values and confidence labels

Dispatches individual events to Streams

Dispatches full sessions to Streams

Completed sessions are used to scored predictive model’s accuracy

Model receives new thresholds for confidence labels

websockets

Streams Demo

Results

Next Steps Integrating visitor information across multiple visits

Automated re-training of predictive model- Adjust to seasonal and trend effects

Generative models for Anomaly Detection- What does a Likely/Unlikely session look like?

Richer models of visitor behavior- Hierarchical models for behavior

Questions? Thank you! Ethan.Dereszynski@webtrends.com

elbpdx@gmail.com

top related