continuous regression analysis – session 6 data collection and data analysis in information...

44
Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008) Supervisor: Dr. Oliver Hinz Chair of Business Administration esp. Information Management Prof. Dr. Wolfgang König Johann Wolfgang Goethe University

Upload: ginger-newman

Post on 26-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Continuous Regression Analysis – Session 6Data Collection and Data Analysis in Information Systems ResearchPh.D. Seminar Presentation Martin Wolf (09.05.2008)Supervisor: Dr. Oliver Hinz

Chair of Business Administrationesp. Information Management Prof. Dr. Wolfgang KönigJohann Wolfgang Goethe University

Page 2: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Session 7)

09.05.2008 Slide 2/44

Page 3: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Session 7)

09.05.2008 Slide 3/44

Page 4: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part I)

1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions

09.05.2008 Slide 4/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 5: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part I)

1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions

09.05.2008 Slide 5/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 6: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Examines the linear dependency between one (bivariate regression) or more (multiple regression) independent variable(s) and one dependent variable (explanatory approach)

Application of least squares method to minimize error between sample data and linear model

Domain of Interest: analysis of time series, prediction of causal relationships, root cause analysis (e.g. individual differences – computer skill)

Goals of Regression Analysis

k

kiki xbby ,0 *ˆ (regression function)

09.05.2008 Slide 6/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Xy

Page 7: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Least Squares Method

09.05.2008 Slide 7/44

(Source: Skiera 2005)

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 8: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Regression coefficients

R²: Goodness of Fit

F-Ratio: Significance of the overall model

T-test: Significance of the regression coefficients

Regression Results

09.05.2008 Slide 8/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 9: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part I)

1. Goals of a Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions

09.05.2008 Slide 9/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 10: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Linear dependency between independent variables and dependent variable

Dependent and independent variables have to be provided at metric level (except dummy variables)

Independent variables have to be uncorrelated (no multicollinearity)-> Collinearity Statistics, Tolerance >=0,1-> Correlation Matrix

Residuals have to be uncorrelated (no autocorrelation)-> Durbin-Watson-Coefficient ≈ 2

Underlying Assumptions (I)

09.05.2008 Slide 10/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 11: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Residuals have to follow a normal distribution-> Kolmogorov-Smirnov Test-> Plots (normality, histogram)-> n>50 -> central limit theorem

No heteroscedasticity of the residuals-> e.g. White‘s general test for heteroscedasticity -> Plot (standardized residuals against stardardized predictors)

Data set has to represent a random sample

No outliers (check DFBETA, standard deviation as distance measure)

Underlying Assumptions (II)

09.05.2008 Slide 11/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 12: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part I)

1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions

09.05.2008 Slide 12/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 13: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Exemplary Regression Analysis

Example Data Set: Consequences of a reduction of work time per week from 40 to 38,5 hours within 80 industries in Baden-Wurttemberg (1985)

Research Question: How does a change in work time influence the employment?

Variables:

av85.10 ∆-employment (compared to 1984)

uv85.10 ∆-revenue (compared to 1984)

stv85.10

∆-over hours (compared to 1984)

azv dichotomous variable (reduction of work time)

09.05.2008 Slide 13/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 14: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

SPSS Syntax File

* Compute Linear Regression, Save Standardized Residuals.* Calculate Durbin-Watson Coefficient (Check for autocorrelation).* Calculate Collinearity Statistics (Check for multicollinearity).* Generate P-P Diagramme (Check for heteroscedasticity).* Display Model Summary.REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA COLLIN TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT av85.10 /METHOD=ENTER uv85.10 stv85.10 azv /SAVE ZRESID /RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID) /SCATTERPLOT=(*ZRESID ,*ZPRED ).

* Kolmogorov-Smirnov Test of Residuals.* (Check if residuals follow a normal distribution).NPAR TESTS /K-S(NORMAL)=ZRE_1 /MISSING ANALYSIS.

09.05.2008 Slide 14/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 15: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

SPSS Output File

09.05.2008 Slide 15/44

Variables Entered/Removed(b)

Model Variables Entered

Variables Removed Method

1 azv, uv85.10, stv85.10(a)

. Enter

a All requested variables entered. b Dependent Variable: av85.10

Model Summary(b)

Model R R Square Adjusted R

Square Std. Error of the Estimate Durbin-Watson

1 ,709(a) ,502 ,482 ,04454 1,873

a Predictors: (Constant), azv, uv85.10, stv85.10 b Dependent Variable: av85.10 ANOVA(b)

Model Sum of

Squares df Mean Square F Sig. 1 Regression ,152 3 ,051 25,551 ,000(a)

Residual ,151 76 ,002 Total ,303 79

a Predictors: (Constant), azv, uv85.10, stv85.10 b Dependent Variable: av85.10

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 16: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

SPSS Output File

09.05.2008 Slide 16/44

Coefficients(a)

Model

Unstandardized Coefficients

Standardized Coefficients t Sig. Collinearity Statistics

B Std. Error Beta Tolerance VIF B Std. Error 1 (Constant) -,029 ,010 -2,979 ,004

uv85.10 ,389 ,059 ,564 6,606 ,000 ,900 1,111 stv85.10 -,354 ,137 -,241 -2,591 ,011 ,757 1,321 azv ,044 ,012 ,361 3,742 ,000 ,703 1,423

a Dependent Variable: av85.10 Collinearity Diagnostics(a)

Model Dimension

Eigenvalue Condition

Index Variance Proportions

(Constant) uv85.10 stv85.10 azv (Constant) uv85.10 1 1 2,036 1,000 ,05 ,03 ,04 ,06

2 1,154 1,329 ,01 ,38 ,17 ,03 3 ,666 1,748 ,01 ,56 ,23 ,13 4 ,144 3,764 ,93 ,02 ,56 ,78

a Dependent Variable: av85.10 Residuals Statistics(a)

Minimum Maximum Mean Std. Deviation N Predicted Value -,0943 ,1226 ,0086 ,04388 80 Residual -,07976 ,16050 ,00000 ,04369 80 Std. Predicted Value -2,345 2,598 ,000 1,000 80 Std. Residual -1,791 3,603 ,000 ,981 80

a Dependent Variable: av85.10

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 17: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

SPSS Output File

09.05.2008 Slide 17/44

One-Sample Kolmogorov-Smirnov Test

Standardized

Residual N 80

Normal Parameters(a,b) Mean ,0000000 Std. Deviation ,98082889

Most Extreme Differences

Absolute ,074 Positive ,074 Negative -,036

Kolmogorov-Smirnov Z ,659 Asymp. Sig. (2-tailed) ,778

a Test distribution is Normal. b Calculated from data.

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 18: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

SPSS Output File

09.05.2008 Slide 18/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 19: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

SPSS Output File

09.05.2008 Slide 19/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 20: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

SPSS Output File

09.05.2008 Slide 20/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 21: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part I)

1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary

09.05.2008 Slide 21/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 22: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Regression Analysis is a means of root cause analysis and prediction, if linear dependency can be assumed

Requires an extensive random sample for a significant model(at least independent variables * 5)

Strict assumptions have to be fullfilled

Summary

11.02.2008 Folie 22/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 23: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Cohen, Jacob; Cohen, Patricia; West, Stephen G.; Aiken, Leona S. (2003): Applied Multiple Regression/ Correlation Analysis for the Behavioral Sciences, 3rd Edition. Lawrence Erlbaum Associates, Publishers, New Jersey, USA.

Backhaus, Klaus; Erichson, Bernd; Plinke, Wulff; Weiber, Rolf (2003): Multivariate Analysemethoden, 10. Auflage. Springer Verlag, Berlin Heidelberg, Germany.

Chatterjee, Samprit; Hadi, Ali S.; Price, Bertram (2000): Regression Analysis by Example, Third Edition. John Wiley & Sons, Inc., New York, USA.

McClendon, MCKee J. (2002): Multiple Regression and Causal Analysis. Reissued by Waveland Press, Inc., Prospect Heights, Illinois,USA.

Literature

09.05.2008 Slide 23/44

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

Page 24: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Literature

Brosius, Felix (2006): SPSS 14. Das mitp-Standartwerk. Redline GmbH, Heidelberg, Germany.

Schnell, Rainer; Hill, Paul B.; Esser, Elke (1999): Methoden der empirischen Sozialforschung, 6. Auflage. R. Oldenbourg Verlag, München, Germany.

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

09.05.2008 Slide 24/44

Page 25: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Questions/Discussion

?

Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary

09.05.2008 Slide 25/44

Page 26: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Session 7)

09.05.2008 Slide 26/44

Page 27: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part II)

1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 27/44

Page 28: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part II)

1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions

09.05.2008 Slide 28/35

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

Page 29: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Introduction of a vessel traffic service (VTS) for the lower Mississippi in late 1977 in order to prevent rammings and collisions of vessels

VTS is an example of a Decision Support System (DSS)

Literature: utilization surrogate of success, only measured as dichotomous variable, no consistent results

Background

09.05.2008 Slide 29/35

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

Page 30: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part II)

1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 30/44

Page 31: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Research Question

Is there a linear causal relationship between DSS Usage and System Performance(less vessel accidents)?

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 31/44

Page 32: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part II)

1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 32/44

Page 33: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Utilization as an Intervening Variable

(Source: Trice and Treacy 1988)

Forward LinkagesBackward Linkages

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 33/44

Page 34: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Utilized Linear Regression Model

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 34/44

Page 35: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part II)

1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 35/44

Page 36: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Model Summary

Explanatory Variables Coefficients

Lagged Accidents Rate (-1)

-0,2091**

Length of DSS Use -7,6515*

Traffic Level 0,0599

DSS Utilization -5,0437*

River Stage -0,3084

Dec-Jan Weather 1,0529*

Oct-Nov Weather 1,1119*

R² 0,4283

D-W Statistic 1,9933

D.F. 133

F-Ratio 11,0727** p<0,01; * p<0.05

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 36/44

Page 37: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Significant negative correlation of DSS utilization, length of DSS Use with objective performance criterion (number of vessel accidents)

ResultsPart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 37/44

Page 38: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part II)

1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 38/44

Page 39: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Objective justification of DSS introduction(IT is an enabler)

Utilization of a broad model

Relatively high fit of the model

High significance of the model

ProsPart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 39/44

Page 40: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

No exact specification of the used dimensions of the coefficients (-> standardized coefficients)

Peak utilization was aggregated for DSS usage

No specification how weather indicator was derived

Assumptions were not addressed

Momentum already showed decreasing trend

ConsPart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 40/44

Page 41: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Blanc and Kozar (1990): An Empirical Investigation of the Relationship Between DSS Usage and System Performance: A Case Study of a Navigation Support System. In: MISQ, 14(3), pp. 263-277.

LiteraturePart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 41/44

Page 42: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Agenda (Part II)

1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 42/44

Page 43: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Questions/Discussion

?

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 43/44

Page 44: Continuous Regression Analysis – Session 6 Data Collection and Data Analysis in Information Systems Research Ph.D. Seminar Presentation Martin Wolf (09.05.2008)

Thank you very much for your attention!

Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary

09.05.2008 Slide 44/44