assumptions: check yo'self before you wreck yourself

Assumptions: Check yo self, before

you wreck yo self.

Erin Shellman @erinshellman Seattle Software Craftsmanship

August 28, 2014 !

Assumptions: Making an ass out of you

and me.

Erin Shellman @erinshellman Seattle Software Craftsmanship

August 28, 2014 !

I’m Erin, and I’m a data scientist.

How much should this cost?

What about these?

…and when?

Price optimization

1. Git yer Big Data!

Price optimization

2. Forecast demand

Price optimization

2. Forecast demand

3. Optimize price

Price optimization

1.Big Data!

2.demand

3.price

4. Profit!!!!!

Price optimization

2. Forecast demand

3. Optimize price

Xrevenue

yi = �0 + �1xi + ✏i

The key is a good forecast.

•Subset the data and focus on one category of product.

• e.g. Alpine ski bindings.

• Prototype & validate in R.

Units Soldi = α + β1(pricei) + εi

Do the easiest thing

Residual

•Subset the data and focus on one category of product.

• e.g. Alpine ski bindings.

• Prototype & validate in R.

Units Soldi = α + β1(pricei) + εi

•We assume that residuals:

1.Normal, with mean zero.

2.Are not autocorrelated.

3.Are unrelated to the predictors.

Assumptions of SLR

•…and boring!

•For statistical methods, assumption testing traditionally relies on visually inspecting plots (and lets be real, most people don’t even do that).

Checking assumptions is hard

40 60 80 100 120

Fitted values

Residuals vs Fitted

194171

-3 -2 -1 0 1 2 3

Theoretical Quantiles

Normal Q-Q

194171

40 60 80 100 120

Fitted values

dardized

Scale-Location194171

0.00 0.01 0.02 0.03 0.04

Leverage

Cook's distance

Residuals vs Leverage

194171

OF all the practices you can leverage to assist your craftsmanship, you will get the most benefit from testing.

Stephen Vance

test_that assumption!

context("Check assumptions of SLR") !test_that("The residuals are normally distributed", { ! expect_that(shapiro.test(model_object$residuals)$p.value, is_more_than(0.05)) !}) !test_that("There is no autocorrelation", { ! expect_that(lmtest::bgtest(model_object)$p.value, is_more_than(0.05)) !}) !test_that("The residuals are unrelated to the predictor", { ! expect_that(cor(model_object$residuals, data$covariates), equals(0)) !}) !

Tests pass!

> test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" ... !

Psych.

> test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" 1.. !!1. Failure(@test_slr.R#12): The residuals are normally distributed ------------------------ shapiro.test(model_object$residuals)$p.value not more than 0.05. Difference: 0.05 !

Linear? Eh.

•We assumed the functional form was linear, but there are several common forms that might better fit the data. 0

100 200 300 400 500Price ($)

Price ($)

Linear Log-log

Linear-log Log-linear

Price ($)

Linear response to change in price. Much more sensitive to change in price.

More gradual response to changes in price Sensitive initially, then gradual

# Automagically explore SLR with common functional forms candidate_models = list(linear = 'units_sold ~ price', loglog = 'log(units_sold + 1) ~ log(price + 1)', linearlog = 'units_sold ~ log(price + 1)', loglinear = 'log(units_sold + 1) ~ price') !run = function(candidate_models, input_data) { forecasts = list() test_input = data.frame(price = 0:1000) ! # Forecast for (model in candidate_models) { test_environment = new.env() ! # Generate the forecast forecasts[[model]] = generate_forecast(model, input_data) ! # Save off current value of things for testing assign("model", forecasts[[model]], envir = test_environment) assign("errors", forecasts[[model]]$residuals, envir = test_environment) assign("covariate", input_data$price, envir = test_environment) assign("label", model, envir = test_environment) ! save(test_environment, file = 'env_to_test.Rda') ! # Run assumption tests test_file("./tests/test_slr.R") ! #### OPTIMIZE PRICE!!! #### opt_results = optimizer(forecasts[[model]], test_input) ! # Multiply the predicted demand by the price for expected revenue opt_results$expected_revenue = test_data$price * opt_results$predicted_units_sold ! pdf(paste(model, “.pdf”, sep = ‘’)) plot_price(opt_results) ! } ! return(forecasts) !}

rut roh…> run(candidate_models, slr_data) Check assumptions of SLR : [1] "units_sold ~ price" 1.. !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 !Check assumptions of SLR : [1] "log(units_sold + 1) ~ log(price + 1)" 1.2 !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 !2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.05545615 !Check assumptions of SLR : [1] "units_sold ~ log(price + 1)" 1.2 !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 !2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.04201906 !Check assumptions of SLR : [1] "log(units_sold + 1) ~ price" 1.. !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05

0 250 500 750 1000Price ($)

Linear Log-log

0 250 500 750 1000Price ($)

Linear Log-log

0 250 500 750 1000Price ($)

Optimal Price = $322

Optimal Price > $1000

Optimal Price = $∞

Optimal Price = $779

100 200 300 400Price ($)

We are just getting warmed up!

In conclusion, these forecasts suck.

Beginner-Intermediate Intermediate-Advanced Advanced-Expert

0 100 200 300 400 5000 100 200 300 400 5000 100 200 300 400 500Price ($)

2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01Date

TIME?!

Try something a little smarter

Units Soldi = α + β1(pricei) + β2(abilityi) + β3(monthi) + εi

Beginner-Intermediate Intermediate-Advanced Advanced-Expert

1000015000

0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000Price ($)

Yeah, but who cares?

•Do we need to throw everything out just because some assumptions are invalidated?

•What is our goal?

•Is it still better than what we did previously?

Wrap it up.

1. Do the easiest thing first, and do it well. It’s how you’re going to learn the domain, and it’s your benchmark for improvement.

2. Test your assumptions, and invest time in building the tools needed to do that effectively.

3. Be cool, stay in school.

Nathan Decker, Brian Pratt & the Evo crew 🎿

Jason Gowans & Bryan Mayer 👬

Elissa “Downtown” Brown, forecasting genius 💁

John Foreman, MailChimp 🐵

#nordstromdatalab 📈

Thanks bros!!

Click-bait!1. Data Carpentry: http://mimno.infosci.cornell.edu/b/articles/carpentry/

2. Getting started with testthat. http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf

3. Clean Code: http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882/

4. Quality Code: http://www.amazon.com/Quality-Code-Software-Principles-Practices/dp/0321832981

5. Revenue Management: http://www.amazon.com/Practice-Management-International-Operations-Research/dp/0387243763/

6. Pricing and Revenue Optimization: http://www.amazon.com/Pricing-Revenue-Optimization-Robert-Phillips-ebook/dp/B005JTDOVE/

7. Original G, Rob Hyndman: https://www.otexts.org/fpp and http://robjhyndman.com/hyndsight/

assumptions: check yo'self before you wreck yourself

Software

the wreck by marius g.mihalache

americas fiscal train wreck

unsinkable wreck rms titanic.ppt

b17 black jack wreck

stetson bank & kraken wreck

salvage & sustainable wreck removal

operations regulatory train wreck

train wreck trailer breakdown

mermaid and wreck reef shipwreck objects pulley sheath on...

uluburun wreck

falmouth's wreck dives

it’s a wreck

moms: treat yo'self

wreck island rockscapes

web wreck-utation - cansecwest 2008

the wreck press kit

#efo14 - wreck this issue

project wreck ebook (v62)

a wreck discovered— a wreck recovered the james craig...

pdf version - the wreck & crash mail...