linear regression using spss

Linear Regression Analysis using SPSS Statistics

Dr Athar KhanMBBS, MCPS, DPH, DCPS-HCSM, DCPS-HPE, MBA,

PGD-StatisticsAssociate Professor

Liaquat College of Medicine & Dentistry

Introduction• Linear regression is the next step up after

correlation. • It is used when we want to predict the value of a

variable based on the value of another variable. • The variable we want to predict is called the

dependent variable (or sometimes, the outcome variable).

• The variable we are using to predict the other variable's value is called the independent variable (or sometimes, the predictor variable).

205/01/23 DR ATHAR KHAN - LCMD

Introduction

• For example, exam performance can be

predicted based on revision time; whether

cigarette consumption can be predicted based

on smoking duration; and so forth.

• If you have two or more independent variables,

rather than just one, you need to use multiple

regression.


Assumptions

• Assumption #1: Your two variables should be

measured at the continuous level (i.e., they are

either interval or ratio variables).


Assumptions• Assumption #2: There needs to be a linear

relationship between the two variables. • Creating a scatter plot using SPSS Statistics and

then visually inspect the scatter plot to check for linearity.

• If the relationship displayed in your scatter plot is not linear, you will have to either run a non-linear regression analysis, perform a polynomial regression or "transform" your data.


Assumptions• Assumption #3: There should be no

significant outliers. • An outlier is an observed data point that has a

dependent variable value that is very different to the value predicted by the regression equation.

• As such, an outlier will be a point on a scatterplot that is (vertically) far away from the regression line indicating that it has a large residual. The difference between the individual value in the sample and the observable sample mean is a residual.


8

ResidualIn regression analysis, the difference between the observed value of the dependent variable (y) and the predicted value (ŷ) is called the residual (e). Each data point has one residual.Residual = Observed value - Predicted value e = y - ŷBoth the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.


http://stattrek.com/Help/Glossary.aspx?Target=Regression

Assumptions• Assumption #4: independence of

observations, which you can easily check using

the Durbin-Watson statistic.

• If observations are made over time, it is likely

that successive observations are related.

• If there is no autocorrelation (where subsequent

observations are related), the Durbin-Watson

statistic should be between 1.5 and 2.5. 1005/01/23 DR ATHAR KHAN - LCMD

Assumptions• Assumption #5: Data needs to

show homoscedasticity, which is where the

variances along the line of best fit remain similar

as you move along the line.


Assumptions• Assumption #6: Finally, residuals (errors) of

the regression line are approximately normally

distributed

• Two common methods to check this assumption

include using either a histogram (with a

superimposed normal curve) or a Normal P-P

Plot.


18

If the beta coefficient is not statistically significant, no statistical significance can be interpreted from that predictor. If the beta coefficient is sufficient, examine the sign of the beta.05/01/23 DR ATHAR KHAN - LCMD

19

for every 1-unit increase in the predictor variable, the dependent variable will increase by the unstandardized beta coefficient value.


linear regression using spss

Health & Medicine