continuous regression analysis – session 6 data collection and data analysis in information...
TRANSCRIPT
Continuous Regression Analysis – Session 6Data Collection and Data Analysis in Information Systems ResearchPh.D. Seminar Presentation Martin Wolf (09.05.2008)Supervisor: Dr. Oliver Hinz
Chair of Business Administrationesp. Information Management Prof. Dr. Wolfgang KönigJohann Wolfgang Goethe University
Agenda (Session 7)
09.05.2008 Slide 2/44
Agenda (Session 7)
09.05.2008 Slide 3/44
Agenda (Part I)
1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions
09.05.2008 Slide 4/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Agenda (Part I)
1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions
09.05.2008 Slide 5/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Examines the linear dependency between one (bivariate regression) or more (multiple regression) independent variable(s) and one dependent variable (explanatory approach)
Application of least squares method to minimize error between sample data and linear model
Domain of Interest: analysis of time series, prediction of causal relationships, root cause analysis (e.g. individual differences – computer skill)
Goals of Regression Analysis
k
kiki xbby ,0 *ˆ (regression function)
09.05.2008 Slide 6/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Xy
Least Squares Method
09.05.2008 Slide 7/44
(Source: Skiera 2005)
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Regression coefficients
R²: Goodness of Fit
F-Ratio: Significance of the overall model
T-test: Significance of the regression coefficients
Regression Results
09.05.2008 Slide 8/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Agenda (Part I)
1. Goals of a Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions
09.05.2008 Slide 9/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Linear dependency between independent variables and dependent variable
Dependent and independent variables have to be provided at metric level (except dummy variables)
Independent variables have to be uncorrelated (no multicollinearity)-> Collinearity Statistics, Tolerance >=0,1-> Correlation Matrix
Residuals have to be uncorrelated (no autocorrelation)-> Durbin-Watson-Coefficient ≈ 2
Underlying Assumptions (I)
09.05.2008 Slide 10/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Residuals have to follow a normal distribution-> Kolmogorov-Smirnov Test-> Plots (normality, histogram)-> n>50 -> central limit theorem
No heteroscedasticity of the residuals-> e.g. White‘s general test for heteroscedasticity -> Plot (standardized residuals against stardardized predictors)
Data set has to represent a random sample
No outliers (check DFBETA, standard deviation as distance measure)
Underlying Assumptions (II)
09.05.2008 Slide 11/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Agenda (Part I)
1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary5. Questions
09.05.2008 Slide 12/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Exemplary Regression Analysis
Example Data Set: Consequences of a reduction of work time per week from 40 to 38,5 hours within 80 industries in Baden-Wurttemberg (1985)
Research Question: How does a change in work time influence the employment?
Variables:
av85.10 ∆-employment (compared to 1984)
uv85.10 ∆-revenue (compared to 1984)
stv85.10
∆-over hours (compared to 1984)
azv dichotomous variable (reduction of work time)
09.05.2008 Slide 13/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
SPSS Syntax File
* Compute Linear Regression, Save Standardized Residuals.* Calculate Durbin-Watson Coefficient (Check for autocorrelation).* Calculate Collinearity Statistics (Check for multicollinearity).* Generate P-P Diagramme (Check for heteroscedasticity).* Display Model Summary.REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA COLLIN TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT av85.10 /METHOD=ENTER uv85.10 stv85.10 azv /SAVE ZRESID /RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID) /SCATTERPLOT=(*ZRESID ,*ZPRED ).
* Kolmogorov-Smirnov Test of Residuals.* (Check if residuals follow a normal distribution).NPAR TESTS /K-S(NORMAL)=ZRE_1 /MISSING ANALYSIS.
09.05.2008 Slide 14/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
SPSS Output File
09.05.2008 Slide 15/44
Variables Entered/Removed(b)
Model Variables Entered
Variables Removed Method
1 azv, uv85.10, stv85.10(a)
. Enter
a All requested variables entered. b Dependent Variable: av85.10
Model Summary(b)
Model R R Square Adjusted R
Square Std. Error of the Estimate Durbin-Watson
1 ,709(a) ,502 ,482 ,04454 1,873
a Predictors: (Constant), azv, uv85.10, stv85.10 b Dependent Variable: av85.10 ANOVA(b)
Model Sum of
Squares df Mean Square F Sig. 1 Regression ,152 3 ,051 25,551 ,000(a)
Residual ,151 76 ,002 Total ,303 79
a Predictors: (Constant), azv, uv85.10, stv85.10 b Dependent Variable: av85.10
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
SPSS Output File
09.05.2008 Slide 16/44
Coefficients(a)
Model
Unstandardized Coefficients
Standardized Coefficients t Sig. Collinearity Statistics
B Std. Error Beta Tolerance VIF B Std. Error 1 (Constant) -,029 ,010 -2,979 ,004
uv85.10 ,389 ,059 ,564 6,606 ,000 ,900 1,111 stv85.10 -,354 ,137 -,241 -2,591 ,011 ,757 1,321 azv ,044 ,012 ,361 3,742 ,000 ,703 1,423
a Dependent Variable: av85.10 Collinearity Diagnostics(a)
Model Dimension
Eigenvalue Condition
Index Variance Proportions
(Constant) uv85.10 stv85.10 azv (Constant) uv85.10 1 1 2,036 1,000 ,05 ,03 ,04 ,06
2 1,154 1,329 ,01 ,38 ,17 ,03 3 ,666 1,748 ,01 ,56 ,23 ,13 4 ,144 3,764 ,93 ,02 ,56 ,78
a Dependent Variable: av85.10 Residuals Statistics(a)
Minimum Maximum Mean Std. Deviation N Predicted Value -,0943 ,1226 ,0086 ,04388 80 Residual -,07976 ,16050 ,00000 ,04369 80 Std. Predicted Value -2,345 2,598 ,000 1,000 80 Std. Residual -1,791 3,603 ,000 ,981 80
a Dependent Variable: av85.10
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
SPSS Output File
09.05.2008 Slide 17/44
One-Sample Kolmogorov-Smirnov Test
Standardized
Residual N 80
Normal Parameters(a,b) Mean ,0000000 Std. Deviation ,98082889
Most Extreme Differences
Absolute ,074 Positive ,074 Negative -,036
Kolmogorov-Smirnov Z ,659 Asymp. Sig. (2-tailed) ,778
a Test distribution is Normal. b Calculated from data.
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
SPSS Output File
09.05.2008 Slide 18/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
SPSS Output File
09.05.2008 Slide 19/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
SPSS Output File
09.05.2008 Slide 20/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Agenda (Part I)
1. Goals of Regression Analysis2. Underlying Assumptions3. Exemplary Regression Analysis (SPSS)4. Summary
09.05.2008 Slide 21/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Regression Analysis is a means of root cause analysis and prediction, if linear dependency can be assumed
Requires an extensive random sample for a significant model(at least independent variables * 5)
Strict assumptions have to be fullfilled
Summary
11.02.2008 Folie 22/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Cohen, Jacob; Cohen, Patricia; West, Stephen G.; Aiken, Leona S. (2003): Applied Multiple Regression/ Correlation Analysis for the Behavioral Sciences, 3rd Edition. Lawrence Erlbaum Associates, Publishers, New Jersey, USA.
Backhaus, Klaus; Erichson, Bernd; Plinke, Wulff; Weiber, Rolf (2003): Multivariate Analysemethoden, 10. Auflage. Springer Verlag, Berlin Heidelberg, Germany.
Chatterjee, Samprit; Hadi, Ali S.; Price, Bertram (2000): Regression Analysis by Example, Third Edition. John Wiley & Sons, Inc., New York, USA.
McClendon, MCKee J. (2002): Multiple Regression and Causal Analysis. Reissued by Waveland Press, Inc., Prospect Heights, Illinois,USA.
Literature
09.05.2008 Slide 23/44
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
Literature
Brosius, Felix (2006): SPSS 14. Das mitp-Standartwerk. Redline GmbH, Heidelberg, Germany.
Schnell, Rainer; Hill, Paul B.; Esser, Elke (1999): Methoden der empirischen Sozialforschung, 6. Auflage. R. Oldenbourg Verlag, München, Germany.
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
09.05.2008 Slide 24/44
Questions/Discussion
?
Part I: 1. Goals 2. Assumptions 3. Exemplary Regression Analysis 4. Summary
09.05.2008 Slide 25/44
Agenda (Session 7)
09.05.2008 Slide 26/44
Agenda (Part II)
1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 27/44
Agenda (Part II)
1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions
09.05.2008 Slide 28/35
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
Introduction of a vessel traffic service (VTS) for the lower Mississippi in late 1977 in order to prevent rammings and collisions of vessels
VTS is an example of a Decision Support System (DSS)
Literature: utilization surrogate of success, only measured as dichotomous variable, no consistent results
Background
09.05.2008 Slide 29/35
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
Agenda (Part II)
1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 30/44
Research Question
Is there a linear causal relationship between DSS Usage and System Performance(less vessel accidents)?
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 31/44
Agenda (Part II)
1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 32/44
Utilization as an Intervening Variable
(Source: Trice and Treacy 1988)
Forward LinkagesBackward Linkages
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 33/44
Utilized Linear Regression Model
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 34/44
Agenda (Part II)
1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 35/44
Model Summary
Explanatory Variables Coefficients
Lagged Accidents Rate (-1)
-0,2091**
Length of DSS Use -7,6515*
Traffic Level 0,0599
DSS Utilization -5,0437*
River Stage -0,3084
Dec-Jan Weather 1,0529*
Oct-Nov Weather 1,1119*
R² 0,4283
D-W Statistic 1,9933
D.F. 133
F-Ratio 11,0727** p<0,01; * p<0.05
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 36/44
Significant negative correlation of DSS utilization, length of DSS Use with objective performance criterion (number of vessel accidents)
ResultsPart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 37/44
Agenda (Part II)
1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 38/44
Objective justification of DSS introduction(IT is an enabler)
Utilization of a broad model
Relatively high fit of the model
High significance of the model
ProsPart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 39/44
No exact specification of the used dimensions of the coefficients (-> standardized coefficients)
Peak utilization was aggregated for DSS usage
No specification how weather indicator was derived
Assumptions were not addressed
Momentum already showed decreasing trend
ConsPart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 40/44
Blanc and Kozar (1990): An Empirical Investigation of the Relationship Between DSS Usage and System Performance: A Case Study of a Navigation Support System. In: MISQ, 14(3), pp. 263-277.
LiteraturePart II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 41/44
Agenda (Part II)
1. Background2. Research Question3. Utilized Model4. Results5. Summary (Pros and Cons)6. Questions
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 42/44
Questions/Discussion
?
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 43/44
Thank you very much for your attention!
Part II: 1. Background 2. Research Question 3. Utilized Model 4. Results 5. Summary
09.05.2008 Slide 44/44