s6 w2 linear regression
TRANSCRIPT
![Page 1: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/1.jpg)
Linear Regression Purpose – Determine if one or more
IVs can predict a DV Examples:
• Does your height (IV) predict how much money you will spend (DV)?
• Does the number of store managers predict how often the machine will break down (DV)?
• Does the number of clicks (IV1) and the number of comments (IV2) on the blog predict the size of revenue (DV)?
![Page 2: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/2.jpg)
Choosing the right test for your research
Research Question Inferential Statistics
Compare means of 2 numeric variables
T test
Relate 2 categorical variables Pearson Chi Square
Relate 2 numeric variables Pearson Correlation r
Use 1+ IVs to explain 1 numeric DV
Regression
![Page 3: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/3.jpg)
Where’s the crystal ball? I want to see the future!
Correlation tells us how X relates to Y (in the past)
Simple Regression tells us how X predicts Y (in the future)• E.g., Does AvgDailyClicks predict
DirectSalesRevenue? Multiple Regression tells us how
X1, X2, X3, ….. predicts Y• E.g., Do NumberBlogAuthors &
AvgDailyClicks predict SponsorRevenue?
![Page 4: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/4.jpg)
Linear Regression Assumptions The relationship between Xs and Y are
linear If you have 2 or more Xs, they are not
perfectly correlated with each other Xs are not correlated with external
variables Independence – Any two observations
should be independent from each other. Errors are normally distributed And a few others
![Page 5: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/5.jpg)
Simple Regression Example: Does Number of Stupid
Customers predict Self Checkout Error Rate?
When we use X to predict Y:• X = the predictor = the independent variable (IV)• Y = the predicted value = the dependent variable
(the value of Y depends on the predictor X) (DV)• You’re basically building a linear model between X
and Y:
Y = Constant + B*X + error
![Page 6: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/6.jpg)
Basic Geometry: Linear Function Y = Constant + B*X + error Y = 1 + 2*X
Source: wikepedia
Constant = 1
Slope B = 2
![Page 7: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/7.jpg)
What do Armani and regression have in common?Model Audition: Fitting the best straight line between
X & Y
Who is the best fitting model? (Hint: Not Kate Moss)
Line that’s closest to all dots
![Page 8: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/8.jpg)
Kate Moss expressed mathematically:DirectSalesRevenue=(constant)
+B*AvgDailyClicks+error
Goodness of Fit (R2): How well does the line fit the data?(How well does Kate fit the average
woman?)
(constant)
Slope B
Distances to regression line = error
Good fit = small errors
![Page 9: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/9.jpg)
Kate Moss as a lousy regression model:
Large errors, poor goodness of fit, small R2
![Page 10: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/10.jpg)
Reading the SPSS Regression Output
Y = Constant + B*X + error DirectSalesRevenue =
19.466-.003*AvgDailyClicks+errorConstant is significantly greater than
zero
Slope (-.003) is significantly less than zero
Goodness of Fit (R2): Model explains 59% variations in DirectSalesRevenue
![Page 11: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/11.jpg)
Reporting Regression in plain English
The number of average daily clicks significantly predicted direct sales revenue, b = -.03, t(39) = 14.72, p < .001. The number of average daily clicks also explained a significant proportion of variance in direct sales revenue, R2 = .59, F(1, 38) = 42.64, p < .001. These findings suggest that, websites with more average daily clicks tend to have lower direct sales revenue level.
![Page 12: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/12.jpg)
Why is regression useful for predicting the future?
Y=200X (R2 = 45%)Given any X, we can predict value of Y with 45%
accuracy
![Page 13: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/13.jpg)
Additional Notes Assumptions: Xs are somewhat independent; Y values are
independent; Y values are normally distributed; errors are normally distributed; X Y relations are linear; no outliers• Example: Time series data are NOT independent – stock price today depends on
stock price yesterday which depends on stock price the day before, etc. Multiple regression is just an extension of single regression
• Use multiple Xs (e.g., both AvgDailyClicks and NumberAuthors) to predict Y
• When you have a condition (e.g., customer choice depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study), you need to create an interaction term next class
When an X is categorical (e.g., whether the blog host is Google or WordPress): Code X in numbers – e.g., 0 is Google, 1 is WordPress
When Y is categorical (e.g., whether the blog won the Outstanding Blog Award): Code Y in numbers – e.g. 0 is No, 1 is Yes, and use Logistic Regression
![Page 14: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/14.jpg)
Y=Constant +B1 * X1 + B2 * X2 + error for Your Project
What is your Y (the value you want to predict)? Is your Y categorical? Do you need Logistic
Regression? See the instructor for help What is your X (your predictor variable)? How many
Xs do you have? Is any of your Xs categorical? Do you have a
coding scheme? Do you have a condition? (e.g., customer choice
depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study) See the instructor for help
![Page 15: S6 w2 linear regression](https://reader036.vdocuments.net/reader036/viewer/2022062513/5559d3acd8b42a98208b4ce0/html5/thumbnails/15.jpg)
Choosing the right test for your research
Research Question Inferential Statistics
Compare means of 2 numeric variables
T test
Relate 2 numeric variables Pearson Correlation r
Relate 2 categorical variables Pearson Chi Square
Use 1+ IVs to explain 1 numeric DV Regression