handbook of statistical modeling for the social and

8
Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University University Park, Pennsylvania and Michael E. Sobel University of Arizona Tucson, Arizona PLENUM PRESS • NEW YORK AND LONDON

Upload: others

Post on 27-Nov-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Edited by

Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany

Clifford С. Clogg Late of Pennsylvania State University University Park, Pennsylvania

and

Michael E. Sobel University of Arizona Tucson, Arizona

PLENUM PRESS • NEW YORK AND LONDON

Contents

Contributors vii

Foreword by Donald В. Rubin ix

Preface xii

1 Causal Inference in the Social and Behavioral Sciences 1

Michael E. Sobel

1 Introduction 1 2 Deterministic Causation in Philosophy 4 3 Probabilistic Causation: Variations on a Deterministic Regularity Account 10

3.1 Philosophical Treatments 10 3.2 Granger Causation in Economics 14

4 Causation and Statistics: An Experimental Approach 17 5 Causal Inference in "Causal Models" 27 6 Discussion 32

2 Missing Data 39

Roderick J. A. Little and Nathaniel Schenker

1 Introduction 39 1.1 Examples 40 1.2 Important Concepts 42 1.3 Naive Approaches 44 1.4 More Principled Approaches 46

2 Weighting Adjustments for Unit Nonresponse 46 3 Maximum Likelihood Assuming Ignorable Nonresponse 48

3.1 Maximum-Likelihood Theory 48 3.2 The Expectation-Maximization Algorithm 49 3.3 Some Important Ignorable Maximum-Likelihood Methods . . . . 51

4 Nonignorable Nonresponse Models 55 4.1 Introduction 55 4.2 Probit Selection Model 56 4.3 Normal Pattern-Mixture Models 58

5 Multiple Imputation 59

xv

xvi Contents

5.1 Imputation 59 5.2 Theoretical Motivation for Multiple Imputation 62 5.3 Creating a Multiply Imputed Data Set 63 5.4 Analyzing a Multiply Imputed Data Set 65

6 Other Bayesian Simulation Methods 66 6.1 Data Augmentation 67 6.2 The Gibbs Sampler 67 6.3 The Use of Iterative Simulation to Create Multiple Imputations . . 68

7 Discussion 69

3 Specification and Estimation of Mean Structures: Regression Models 77

Gerhard Arminger

1 Introduction 77 2 The Linear Regression Model 80

2.1 Model Specification 80 2.2 Estimation of Regression Coefficients 84 2.3 Regression Diagnostics 89 2.4 Multivariate Linear Regression 97

3 Maximum Likelihood Estimation 100 3.1 Loglikelihood function 100 3.2 Properties of the ML Estimator 101 3.3 Likelihood Ratio, Wald, and Lagrange Multiplier Tests 104 3.4 Restrictions on Parameters 108

4 ML Estimation Under Misspecification I l l 5 Pseudo-ML Estimation 113

5.1 Mean Structures 113 5.2 The Linear Exponential Family 114 5.3 Properties of PML Estimators 121 5.4 Computation of PML Estimators With Fisher Scoring 124 5.5 PML Wald and PML Lagrange Multiplier Tests 128 5.6 Regression Diagnostics Under PML Estimation 129

6 Quasi Generalized PML Estimation 131 6.1 Specification of Mean and Variance 131 6.2 Properties of PML Estimation With Nuisance Parameters 132 6.3 Computation of QGPML Estimators 135 6.4 QGPML Wald, Lagrange Multiplier, and Likelihood Ratio Tests . 135 6.5 Regression Diagnostics Under QGPML Estimation 136

7 Univariate Nonlinear Regression Models 139 7.1 Models for Count Data 139 7.2 Standard Nonlinear Regression Models 143 7.3 Models For Dichotomous Outcomes 146 7.4 Quantit Models for Censored Outcomes 150 7.5 Generalized Linear Models 153

Contents xvii

8 Multivariate Nonlinear Regression Models 160 8.1 Models for Ordered Categorical Variables 160 8.2 Models for Doubly Censored and Classified Metrie Outcomes . . 164 8.3 Unordered Categorical Variables 166 8.4 Generalized Estimating Equations for Mean Structures 172

9 Software 177

4 Specification and Estimation of Mean- and Covariance-Structure Models 185

Michael W. Browne and Gerhard Arminger

1 Introduction 185 1.1 Background and Notation 186 1.2 Scaling Considerations for Mean, Covariance, and Correlation Struc­

tures 187 1.3 Fitting the Moment Structure 188

2 Large Sample Properties of Estimators 194 2.1 Lack of Fit of the Model and the Assumption of Population Drift . 195 2.2 Reference Functions and Correctly Specified Discrepancy Functions 195

3 Computational Aspects 200 4 Examples of Moment Structures 203

4.1 The Factor Analysis Model 203 4.2 Structural Equation Models 205 4.3 Other Mean and Covariance Structures 216

5 Mean and Covariance Structures with Nonmetric Dependent Variables . . 220 5.1 Unconditional and Conditional Mean and Covariance Structures . 221 5.2 Inclusion of Threshold Models 223 5.3 Conditional Polyserial and Polychoric Covariance and Correlation

Coefficients 226 5.4 Estimation 227 5.5 Multigroup Analysis 232 5.6 Example: Achievement in and Attitude toward High School Math­

ematics 232 6 Software 241

5 The Analysis of Contingency Tables 251

Michael E. Sobel

1 Introduction 251 2 Introductory Examples 253

2.1 Some Models for Univariate Distributions 253 2.2 Measuring Association in the Two-by-Two Table: The Odds Ratio 259

3 Odds Ratios for Two- and Three-Way Tables 264 3.1 Odds Ratios for Two-Way Tables 264 3.2 Odds Ratios for Three-Way Tables 265

XViÜ Contents

4 Models for the Two-Way Table 266 4.1 Basic Models 266 4.2 Models for Square Tables 270 4.3 Models for Ordinal Variables 274

5 Models for the Three-Way Table 282 5.1 Basic Models 282 5.2 Collapsibility in Models for the Three-Way Table 285 5.3 Models for Tables with a One-to-One Correspondence among

Categories 288 5.4 Models for Tables With Ordered Variables 289

6 Higher-Way Tables 291 7 Estimation Theory 293 8 Residual Analysis and Model-Selection Procedures 298 9 Software 300

9.1 GLIM 300 9.2 BMDP 301 9.3 SAS 301 9.4 SPSS 302 9.5 GAUSS 302 9.6 CDAS 302 9.7 S-Plus 303

6 Latent Class Models 311

Clifford С Clogg^

1 Introduction 311 2 Computer Programs 312 3 Latent Class Models and Latent Structure Models 313 4 Basic Concepts and Notation 315 5 The Model Defined and Alternative Forms 317

5.1 MeasuringFit 318 5.2 Alternative Forms of the Model 319

6 An Example: Latent Classes in the American Occupational Structure . . . 321 6.1 Standard Latent Class Models for Two-Way Tables 321 6.2 Some Related Models 324

7 Research Contexts Giving Rise to Latent Classes and Latent Class Models 327 7.1 Medical Diagnosis 327 7.2 Measuring Model Fit with Latent Class Evaluation Models . . . . 328 7.3 Rater Agreement 330 7.4 Latent Class Models for Missing Categories 332

8 Exploratory Latent Class Analysis and Clustering 333 9 Predicting Membership in Latent Classes 336 10 Latent Class Models in Multiple Groups: Categorical Covariates in Latent

Class Analysis 340 t Deceased

XIX

11 Scaling, Measurement, and Scaling Models as Latent Class Models . . . . 343 11.1 Ordinal X 343 11.2 Classical Scaling Models 344 11.3 The Rasch Model and Related Models 348 11.4 Extending Latent Class Models to Other Scaling Contexts . . . . 351

12 Conclusion 352

Panel Analysis for Metric Data 361

Cheng Hsiao

1 Introduction 361 2 A General Framework 367

2.1 The Basic Model 367 2.2 A Bayes Solution 368

3 Two Extreme Cases — All Cross-Sectional Units Have the Same Behav­ioral Pattern versus Different Units Have Different Behavioral Patterns . . 374 3.1 A Common Model for All Cross-Sectional Units 374 3.2 Different Models for Different Cross-Sectional Units 374

4 Variable Intercept Model 375 5 Error Components Models 376 6 Random Coefficients Models 382 7 Mixed Fixed and Random Coefficients Models 384 8 Random or Fixed Effects (Parameters) 386

8.1 An Example 386 8.2 Some Basic Considerations 388 8.3 Correlations between Effects and Included Explanatory Variables . 390 8.4 Hypothesis Testing or Model Selection 393

9 Conclusion 395

Panel Analysis for Qualitative Variables 401

Alfred Hamerle and Gerd Ronning

1 Introduction 401 2 Some Regression Models for Binary Outcomes 402

2.1 Probit Model, Logit Model, Linear Probability Model, and Maxi­mum Likelihood Estimation 402

2.2 Generalized Least Squares Estimation When There Are Repeated Observations 407

2.3 A Note on Interpretation 409 2.4 Models for Limited Dependent Variables 409

3 Binary Regression Models for Panel Data 411 3.1 The Fixed Effects Logit Model 413 3.2 Random Effects Models 417 3.3 Random Coefficients Models 422

XX Contents

3.4 Probit Models With Autocorrelated Errors 423 3.5 Autoregressive Probit Models 429 3.6 Panel Models for Ordinal Data 431

4 Markov Chain Models 433

5 Tobit Models for Panel Data 435 6 Models for Count Data 437

6.1 Poisson Distribution and Negative Binomial Distribution 437

6.2 Mixtures of Poisson Distributions 438 6.3 The Poisson Model 438 6.4 A Model with Overdispersion 439 6.5 Maximum Quasi-likelihood Estimation Under Overdispersion . . 441

6.6 An Example with Cross-Sectional Data 442 6.7 Panel Models for Count Data 444

9 Analysis of Event Histories 453

Trond Petersen

1 Introduction 453

2 Motivation 455 3 The Hazard-Rate Framework 456

3.1 Basic Concepts 456 3.2 Discrete-Time Formulations 457 3.3 Continuous-Time Formulations 458

4 Time-Independent Covariates 465 5 Time-Dependent Covariates 469 6 Observability of the Dependent Variable 476 7 Repeated Events 478 8 Multistate Processes: Discrete State Space 481 9 Multistate Processes: Continuous State Space 483 10 Estimation Procedures 488 11 Unobserved Heterogeneity 492 12 Time-Aggregation Bias 495 13 Continuous-Versus Discrete-Time Models 498 14 Structural Models for Event Histories 500 15 Sampling Plans 501

15.1 A Conditional Likelihood for ta, given tb 504 15.2 Likelihood for tb and Joint Likelihood for ta and tb 505 15.3 Full Likelihood in tb, ta, and x 508

16 Left Censoring 511 17 Conclusion 512

Contents Xxi

10 Random Coefficient Models 519

Nicholas Т. Longford

1 Introduction 519 1.1 An Illustration 522 1.2 Clustered Design 523

2 Models With a Single Explanatory Variable 524 2.1 Patterns of Variation 526 2.2 Contextual Models 529 2.3 Terminology: A Review 530 2.4 Applications 531

3 The General Two-Level Model 533 3.1 Categorical Variables and Variation 536 3.2 Multivariate Regression as a Random Coefficient Model 536 3.3 Contextual Models 537 3.4 Random Polynomials 538 3.5 Fixed and Random Parts 538 3.6 Model Identification 539

4 Estimation 540 4.1 The Fisher Scoring Algorithm 544 4.2 Diagnostics 546 4.3 Model Selection 546

5 Multiple Levels of Nesting 547 5.1 Estimation 549 5.2 Proportion of Variation Explained in Multilevel Models 549

6 Generalized Linear Models 551 6.1 Estimation 552 6.2 Quasi-likelihood 553 6.3 Extensions for Dependent Data 554 6.4 Estimation for Models With Dependent Data 555

7 Factor Analysis and Structural Equations 557 7.1 Factor Analysis 557 7.2 Structural Equation Models 561

8 Example: Wage Inflation in Britain 562 9 Software 568

9.1 ML3 569 9.2 VARCL 569 9.3 HLM 570 9.4 Outlook 570

Index 579