webinar - pattern mining log data - vega (20160426)
TRANSCRIPT
Churn Prediction: Understanding your
customers and taking action.
@datoinc#churnPredictionDato
Hi! My name is …
Antoine AtallahPrincipal Data Scientist
Dato toolkits team, novice powerlifter, Hawks fan.
2
Hi! My name is …
#churnPredictionDato
Hi! My name is …
Karla VegaCustomer Success Manager
Aerospace engineer, dog trainer, running fan @vegakp
3
Hi! My name is …
#churnPredictionDato
About Us!
#churnPredictionDato
+ =
Questions?• (Now) we love questions. Feel free to interrupt for questions!• (Later) Email us [email protected], [email protected]
Webinar!
#churnPredictionDato
Extracting Insights from Data
Data Science Workflow
Ingest Transform Model Insight
#churnPredictionDato
Log Journey
Lots of data
Insights Profits
#churnPredictionDato
Mining Log Data
Logs are everywhere!
#churnPredictionDato
Different kinds of logs• Raw logs
• Each row containing an individual event for a user, at a given time
• Aggregated logs• Each row contains the interactions for a user over a period of
time• For instance, user activity over one-month rollups• This is the traditional data output of Business Intelligence
infrastructures• User side-data
• Information about each user (demographics, etc…)#churnPredictionDato
Logs contain usage patterns
Small Purchase
Large Purchase
#churnPredictionDato
Different kinds usage patternsKinds of Patterns
Visits, Purchases, Events Frequency
Visits, Purchase Quantity
Changes in value over time
Change in time between visits, purchases, events
Time since last action or visit
Demographic information (age, gender, …)
Types of items purchased (seasonality, quality)
…
#churnPredictionDato
Retaining customers/visitors is important• Cost to acquire a new customer is high vs retaining a customer• Gives a pulse on the health of the business• Can help take preventive actions and act before it’s too late• Can help create more effective marketing campaigns
#churnPredictionDato
What is Churn Prediction
What is Churn• Churn Prediction is predicting user’s probability to stop coming
back (churn)• Works by observing past user behavior
#churnPredictionDato
Churn Prediction
#churnPredictionDato
(Apr 2016)
Daily activity logs for Jan 2015 – April 2016
More Precisely• Churn Prediction is predicting user’s probability to stop coming
back (churn)• Works by observing past user behavior• We define a time boundary at which we want to predict churn• Anyone not present N days (default is 30) after the boundary is
considered to have churned• The M days (default 60) before the boundary are used to
generate features• Multiple boundaries can be specified to extract more patterns
#churnPredictionDato
Feature and Label Generation
#churnPredictionDato
(Apr 2016)
Daily activity logs for Jan 2015 – April 2016
How to use Churn Prediction
Choosing Time Boundaries• Time Boundaries are moments in the past that are used to
observe user behavior and generate labels• The time before the boundary is used to observe patterns• The time after the boundary is used to generate labels
Boundaries Meaning
January 1st 2016 Will use the patterns from before January 1st 2016 to predict User Churn after January 1st 2016
January 1st 2016,December 1st 2015
Will use the patterns from before January 1st 2016 to predict User Churn after January 1st 2016;Will use the patterns from before December 1st 2015 to predict User Churn after December 1st 2015
This will analyze more patterns and build a richer model#churnPredictionDato
Choosing a Churn Period• The Churn Period corresponds to how far in the future we want to
predict.• It also means that for training purposes, users who have not been
active for this amount of time will be considered to have churned
Churn Period Predicts
7 Days Probability for each user to be leaving next week
30 Days Probability for each user to be leaving next month
3 Months Probability for each user to be leaving next quarter
#churnPredictionDato
Choosing Lookback Periods• Lookback Periods is how far in the past we look to extract user
behavior patterns (features)• Multiple lookback periods can be provided to generate richer
features
Lookback Periods Features
3 Days Will use the 3 days before each Time Boundary to extract usage patterns
30 Days Will use the 30 days before each Time Boundary to extract usage patterns
7 Days, 1 Month Will use the week and the month before each Time Boundary to extract usage patterns
#churnPredictionDato
Choosing appropriate parameters• If we want to predict Churn for this quarter, we might want to set:
• Churn Period to be 3 Months (how far in the future we predict)• Lookback Periods to be 2, 4, 8, 16 weeks (how far in the past
to extract patterns from)• Time Boundaries to be January 1st 2016, January 1st 2015,
January 1st 2014• Notice that we chose the same quarter each year for Time
Boundary• Choosing past data with the same underlying behavior will
provide more accurate predictions
#churnPredictionDato
Choosing appropriate parameters• If we want to predict Churn for this month, we might want to set:
• Churn Period to be 1 Month (how far in the future we predict)• Lookback Periods to be 7, 14, 30, 60 days (how far in the past
to extract patterns from)• Time Boundaries to be January 1st 2016, October 1st 2015,
September 1st 2015, August 1st 2015• In this case, we intentionally skipped over November and
December 2015 since it is the holiday season, and may exhibit very different behavior
#churnPredictionDato
Key Takeaways• Label generation is extremely simplified (choose a Churn Period)• Feature generation is extremely simplified (choose Lookback
Periods and Time Boundaries)• Choose representative time frames to predict churn in the desired
time frame
#churnPredictionDato
Interpreting the Results
Output of the model• The Churn Prediction model returns a probability of churn for
each provided user
#churnPredictionDato
Using the Probabilities
Churn Probability
Num
ber o
f Use
rs
High Probability of Churn:
Might be hard to rescue these users
Mid-Probability of Churn: We should try to rescue these users
Low-Probability of Churn: Send a thank-you note!
#churnPredictionDato
Using the Probabilities• We can target different users, using their probability of Churn as a
guideline• Different marketing messages can be created based on the
probability of Churn• The highest-probability users are not always the best to target,
depending on the cost of the action to take to retain them• Gives a new dimension on the user base• Can be used to monitor the health of the user population over
time
#churnPredictionDato
Demo
Summary
Log Data Mining
≠Rocket Science
• Define time parameters to identify patterns and generate labels.
• Extract predictions to gain insights about your user population.
• Take action and help grow your healthy business.
Churn Prediction
#churnPredictionDato
SELECT questions FROM audienceWHERE difficulty == “Easy”
Thanks!