linear regression using spss
TRANSCRIPT
Linear Regression Analysis using SPSS Statistics
Dr Athar KhanMBBS, MCPS, DPH, DCPS-HCSM, DCPS-HPE, MBA,
PGD-StatisticsAssociate Professor
Liaquat College of Medicine & Dentistry
Introduction• Linear regression is the next step up after
correlation. • It is used when we want to predict the value of a
variable based on the value of another variable. • The variable we want to predict is called the
dependent variable (or sometimes, the outcome variable).
• The variable we are using to predict the other variable's value is called the independent variable (or sometimes, the predictor variable).
205/01/23 DR ATHAR KHAN - LCMD
Introduction
• For example, exam performance can be
predicted based on revision time; whether
cigarette consumption can be predicted based
on smoking duration; and so forth.
• If you have two or more independent variables,
rather than just one, you need to use multiple
regression.
305/01/23 DR ATHAR KHAN - LCMD
Assumptions
• Assumption #1: Your two variables should be
measured at the continuous level (i.e., they are
either interval or ratio variables).
405/01/23 DR ATHAR KHAN - LCMD
Assumptions• Assumption #2: There needs to be a linear
relationship between the two variables. • Creating a scatter plot using SPSS Statistics and
then visually inspect the scatter plot to check for linearity.
• If the relationship displayed in your scatter plot is not linear, you will have to either run a non-linear regression analysis, perform a polynomial regression or "transform" your data.
505/01/23 DR ATHAR KHAN - LCMD
605/01/23 DR ATHAR KHAN - LCMD
Assumptions• Assumption #3: There should be no
significant outliers. • An outlier is an observed data point that has a
dependent variable value that is very different to the value predicted by the regression equation.
• As such, an outlier will be a point on a scatterplot that is (vertically) far away from the regression line indicating that it has a large residual. The difference between the individual value in the sample and the observable sample mean is a residual.
705/01/23 DR ATHAR KHAN - LCMD
8
ResidualIn regression analysis, the difference between the observed value of the dependent variable (y) and the predicted value (ŷ) is called the residual (e). Each data point has one residual.Residual = Observed value - Predicted value e = y - ŷBoth the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.
05/01/23 DR ATHAR KHAN - LCMD
905/01/23 DR ATHAR KHAN - LCMD
Assumptions• Assumption #4: independence of
observations, which you can easily check using
the Durbin-Watson statistic.
• If observations are made over time, it is likely
that successive observations are related.
• If there is no autocorrelation (where subsequent
observations are related), the Durbin-Watson
statistic should be between 1.5 and 2.5. 1005/01/23 DR ATHAR KHAN - LCMD
1105/01/23 DR ATHAR KHAN - LCMD
Assumptions• Assumption #5: Data needs to
show homoscedasticity, which is where the
variances along the line of best fit remain similar
as you move along the line.
1205/01/23 DR ATHAR KHAN - LCMD
1305/01/23 DR ATHAR KHAN - LCMD
Assumptions• Assumption #6: Finally, residuals (errors) of
the regression line are approximately normally
distributed
• Two common methods to check this assumption
include using either a histogram (with a
superimposed normal curve) or a Normal P-P
Plot.
1405/01/23 DR ATHAR KHAN - LCMD
1505/01/23 DR ATHAR KHAN - LCMD
1605/01/23 DR ATHAR KHAN - LCMD
1705/01/23 DR ATHAR KHAN - LCMD
18
If the beta coefficient is not statistically significant, no statistical significance can be interpreted from that predictor. If the beta coefficient is sufficient, examine the sign of the beta.05/01/23 DR ATHAR KHAN - LCMD
19
for every 1-unit increase in the predictor variable, the dependent variable will increase by the unstandardized beta coefficient value.
05/01/23 DR ATHAR KHAN - LCMD