psychometric and machine learning approaches to … › attachments › 206489 › ...enough for...

f

Psychometric and Machine Learning Approaches to Diagnostic Assessment

by

Oscar Gonzalez

A Dissertation Presented in Partial Fulfillment

of the Requirements for the Degree

Doctor of Philosophy

Approved May 2018 by the

Graduate Supervisory Committee:

Michael C. Edwards, Co-Chair

David P. MacKinnon, Co-Chair

Kevin J. Grimm

Yi Zheng

ARIZONA STATE UNIVERSITY

August 2018

i

ABSTRACT

The goal of diagnostic assessment is to discriminate between groups. In many

cases, a binary decision is made conditional on a cut score from a continuous scale.

Psychometric methods can improve assessment by modeling a latent variable using item

response theory (IRT), and IRT scores can subsequently be used to determine a cut score

using receiver operating characteristic (ROC) curves. Psychometric methods provide

reliable and interpretable scores, but the prediction of the diagnosis is not the primary

product of the measurement process. In contrast, machine learning methods, such as

regularization or binary recursive partitioning, can build a model from the assessment

items to predict the probability of diagnosis. Machine learning predicts the diagnosis

directly, but does not provide an inferential framework to explain why item responses are

related to the diagnosis. It remains unclear whether psychometric and machine learning

methods have comparable accuracy or if one method is preferable in some situations. In

this study, Monte Carlo simulation methods were used to compare psychometric and

machine learning methods on diagnostic classification accuracy. Results suggest that

classification accuracy of psychometric models depends on the diagnostic-test correlation

and prevalence of diagnosis. Also, machine learning methods that reduce prediction error

have inflated specificity and very low sensitivity compared to the data-generating model,

especially when prevalence is low. Finally, machine learning methods that use ROC

curves to determine probability thresholds have comparable classification accuracy to the

psychometric models as sample size, number of items, and number of item categories

increase. Therefore, results suggest that machine learning models could provide a viable

ii

alternative for classification in diagnostic assessments. Strengths and limitations for each

of the methods are discussed, and future directions are considered.

iii

DEDICATION

This document is dedicated to my parents, who gave me the trust and the confidence to

pursue a college career. This document is also dedicated to my amazing friends Amber,

James, Matt, and Peter, who also finished graduate degrees during 2018. Thank you all!

Para mis padres, que me dieron todo su apoyo y confianza para conseguir una carrera

universitaria. Para mis mejores amigos, Amber, James, Matt, and Peter quienes también

terminaron posgrados en este año. Gracias a todos!

Small victories, be humble, move forward.

iv

ACKNOWLEDGMENTS

I thank my committee members, Dr. David P. MacKinnon, Dr. Michael C. Edwards, Dr.

Yi Zheng, and Dr. Kevin J. Grimm for their help in the development of this document.

Their comments were challenging and made the project better. Dave, I cannot thank you

enough for saving my career after losing Roger. Your endless approach to learning

inspired me to pursue an academic career. Mike, thank you for your support on my

psychometric interests, professional guidance, and mentorship. Although I am often

indecisive, you have taught me to keep in mind my life priorities and seek out new

opportunities. Kevin, thank you for our endless conversations, both personal and

professional. Your advice shaped my future behavior as a faculty member and my

approach to academics. Yi, thank you for all of your insight about my projects and

professional advice. Thank you for making sure I push my limits on my psychometrics

knowledge and consider applications of my work outside of psychology. Finally, my

research for this dissertation (and during most of my graduate studies) was supported by

the National Science Foundation Graduate Research Fellowship under Grant No. DGE-

1311230.

v

TABLE OF CONTENTS

Page

LIST OF TABLES ........................................................................................................... vii

LIST OF FIGURES ........................................................................................................ xvi

1 INTRODUCTION ...........................................................................................................1

Diagnostic Assessment and Classification Accuracy .............................................3

Psychometric Methods for Diagnostic Assessment ................................................7

Machine Learning Approaches to Diagnostic Assessment ...................................18

Simulation Study ...................................................................................................26

2 METHOD .......................................................................................................................28

Data-Generation ....................................................................................................28

Data Analysis ........................................................................................................30

3 RESULTS .......................................................................................................................36

Estimation of the Psychometric Models ................................................................37

Classification Accuracy of the Psychometric Models ...........................................37

Estimation of the Machine Learning Models ........................................................40

Classification Accuracy of Machine Learning Models with ROC Classifiers .....41

Comparing Psychometric and Machine Learning Models ....................................44

Scoring Machine Learning Items for Person Parameter Recovery .......................48

4 DISCUSSION .................................................................................................................50

Revisiting Study Hypotheses ................................................................................51

Limitations and Future Directions ........................................................................55

Conclusion .............................................................................................................59

vi

Page

REFERENCES .................................................................................................................61

APPENDIX

A. FLOWCHART OF THE DATA-GENERATING PROCEDURE ..................64

B. TABLES ...........................................................................................................66

C. FIGURES .........................................................................................................68

D. R SYNTAX TO GENERATE DATA .............................................................71

E. R SYNTAX TO ANALYZE DATA ................................................................77

F. SUPPLEMENTAL WRITE-UP OF SIMULATION RESULTS .....................85

vii

LIST OF TABLES

Table Page

1. Median of classification accuracy indices by model ..................................................67

1.1. IRT Nonconvergence (out of 500) in the 10-item condition ............................... 135

1.2. IRT Nonconvergence (out of 500) in the 30-item condition ................................136

1.3. Mean Squared Error for the Theta Estimate in the 10-item condition ...................137

1.4. Mean Squared Error for the Theta Estimate in the 30-item condition ..................138

1.5. Correlation between True and Theta Estimate in the 10-item condition ..............139

1.6. Correlation between the True and Theta Estimate in the 30-item condition ........140

1.7. Mean Squared Error for the Slopes in the 2PL Model ..........................................141

1.8. Variance Explained for the Slopes in the 2PL Model ...........................................142

1.9. Mean Squared Error for the Threshold Parameter in the 2PL Model ...................143

1.10. Variance Explained for the Threshold Parameter in the 2PL Model ..................144

1.11. Mean Squared Error for the Slopes in the Graded Response Model ..................145

1.12. Variance Explained for the Slopes in the Graded Response Model ...................146

1.13. Mean Square Error for the First Threshold in the Graded Response Model ......147

1.14. Mean Square Error for the Second Threshold in the Graded Response Model ...148

1.15. Mean Square Error for the Third Threshold in the Graded Response Model ......149

1.16. Mean Square Error for the Fourth Threshold in the Graded Response Model ....150

1.17. Correlation between True and Estimated First Threshold in the Graded Response

Model .................................................................................................................151

1.18. Correlation between True and Estimated Second Threshold in the Graded

Response Model ................................................................................................152

viii

Table Page

1.19. Correlation between True and Estimated Third Threshold in the Graded Response

Model .................................................................................................................153

1.20. Correlation between True and Estimated Fourth Threshold in the Graded

Response Model ................................................................................................154

2.1. Classification Rate of Data-generating Theta in Conditions with 10 items ......... 155

2.2. Classification Rate of Data-generating Theta in Conditions with 30 items ..........156

2.3. Sensitivity Rate of Data-generating Theta in Conditions with 10 items .............. 157

2.4. Sensitivity Rate of Data-generating Theta in Conditions with 30 items .............. 158

2.5. Specificity of Data-generating Theta in Conditions with 10 items....................... 159

2.6. Specificity of Data-generating Theta in Conditions with 30 items....................... 160

2.7. Classification Rate of Estimated Thetas in Conditions with 10 items .................. 161

2.8. Classification Rate of Estimated Thetas in Conditions with 30 items .................. 162

2.9. Sensitivity of Estimated Thetas in Conditions with 10 items ............................... 163

2.10. Sensitivity of Estimated Thetas in Conditions with 30 items ............................. 164

2.11. Specificity of Estimated Thetas in Conditions with 10 items ............................. 165

2.12. Specificity of Estimated Thetas in Conditions with 30 items ............................. 166

2.13. Classification Rate of Raw Summed Score in Conditions with 10 items .......... 167

2.14. Classification Rate of Raw Summed Score in Conditions with 30 items ...........168

2.15. Sensitivity of Raw Summed Score in Conditions with 10 items ........................ 169

2.16. Sensitivity of Raw Summed Score in Conditions with 30 items ........................ 170

2.17. Specificity of Raw Summed Score in Conditions with 10 items ........................ 171

2.18. Specificity of Raw Summed Score in Conditions with 30 items ........................ 172

ix

Table Page

3.1. Proportion of the CART Models with a Bayes Classifier that did not Assign Cases

to the Minority Class in Conditions with 10 items ............................................173

3.2. Proportion of the CART Models with a Bayes Classifier that did not Assign Cases

to the Minority Class in Conditions with 30 items .............................................174

3.3. Proportion of the Random Forest Models with a Bayes Classifier that did not

Assign Cases to the Minority Class in Conditions with 10 items ......................175

3.4. Proportion of the Random Forest Models with a Bayes Classifier that did not


3.5. Proportion of the Lasso Models with a Bayes Classifier that did not Assign Cases


3.6. Proportion of the Lasso Models with a Bayes Classifier that did not Assign Cases


3.7. Proportion of the Relaxed Lasso Models with a Bayes Classifier that did not


3.8. Proportion of the Relaxed Lasso Models with a Bayes Classifier that did not


3.9. Proportion of the Logistic Models with a Bayes Classifier that did not Assign

Cases to the Minority Class in Conditions with 10 items ..................................181

3.10. Proportion of the Logistic Models with a Bayes Classifier that did not Assign

Cases to the Minority Class in Conditions with 30 items ..................................182

3.11. Proportion of the Lasso Models with a ROC Classifier that did not Assign Cases


x

Table Page

3.12. Proportion of the Lasso Models with a ROC Classifier that did not Assign Cases

to the Minority Class in Conditions with 30 items ........................................... 184

4.1. Classification Accuracy of CART for Models with 20% Prevalence and greater

than N=250 ....................................................................................................... 185

4.2. Classification Rate of Random Forest with a Bayes Classifier in Conditions with

Prevalence of .20 .............................................................................................. 186

4.3. Sensitivity of Random Forest with a Bayes Classifier in Conditions with

Prevalence of .20 ............................................................................................... 187

4.4. Specificity of Random Forest with a Bayes Classifier in Conditions with

Prevalence of .20 ............................................................................................... 188

4.4A. Classification Accuracy of Lasso Logistic Regression with a Bayes Classifier for

Models with Prevalence of .20, Diagnosis-test Correlation of .7, Five-category

Items, and Sample Size greater than N=250 ..................................................... 189

4.5. Classification Rate of Relaxed Lasso Logistic Regression with a Bayes Classifier

for conditions with Five-category Items and a Diagnosis-test Correlation

of .70 ..................................................................................................................189

4.6. Sensitivity of Relaxed Lasso Logistic Regression with a Bayes Classifier for

conditions with Five-category Items and a Diagnosis-test Correlation of .70 ...190

4.7. Specificity of Relaxed Lasso Logistic Regression with a Bayes Classifier for

conditions with Five-category Items and a Diagnosis-test Correlation of .70 ...191

4.8. Classification Rate of Logistic Regression with a Bayes Classifier in Conditions

with 30 items ......................................................................................................192

xi

Table Page

4.9. Sensitivity of Logistic Regression with a Bayes Classifier in Conditions with 30

items .................................................................................................................. 193

4.10. Specificity of Logistic Regression with a Bayes Classifier in Conditions with 30

items .................................................................................................................. 194

4.11. Classification Rate of Random Forest with a ROC Classifier in Conditions with

10 items ............................................................................................................. 195


30 items ............................................................................................................. 196

4.13. Sensitivity of Random Forest with a ROC Classifier in Conditions with

10 items ............................................................................................................ 197

4.14. Sensitivity of Random Forest with a ROC Classifier in Conditions with

30 items ............................................................................................................ 198

4.15. Specificity of Random Forest with a ROC Classifier in Conditions with

10 items ............................................................................................................ 199

4.16. Specificity of Random Forest with a ROC Classifier in Conditions with

30 items ............................................................................................................ 200


10 items ............................................................................................................. 201

4.18. Classification Rate of Logistic Regression with a ROC Classifier in Conditions

with 30 items ..................................................................................................... 202

4.19. Sensitivity of Logistic Regression with a ROC Classifier in Conditions with 10

items .................................................................................................................. 203

xii

Table Page

4.20. Sensitivity of Logistic Regression with a ROC Classifier in Conditions with 30

items .................................................................................................................. 204

4.21. Specificity of Logistic Regression with a ROC Classifier in Conditions with 10

items .................................................................................................................. 205

4.22. Specificity of Logistic Regression with a ROC Classifier in Conditions with 30

items .................................................................................................................. 206

4.23. Classification Rate of Lasso Logistic Regression with a ROC Classifier in

Conditions with Diagnosis-test Correlation of .70 ........................................... 207

4.24. Sensitivity of Lasso Logistic Regression with a ROC Classifier in Conditions with

Diagnosis-test Correlation of .70 ...................................................................... 208

4.25. Specificity of Lasso Logistic Regression with a ROC Classifier in Conditions

with Diagnosis-test Correlation of .70 ............................................................. 209

4.26. Classification Rate of Relaxed Lasso with a ROC Classifier in Conditions with


4.27. Sensitivity of Relaxed Lasso with a ROC Classifier in Conditions with Diagnosis-

test Correlation of .70 ....................................................................................... 211

4.28. Specificity of Relaxed Lasso with a ROC Classifier in Conditions with Diagnosis-

test Correlation of .70 ....................................................................................... 212

6.1. Classification Rate Differences between Random Forest with ROC Classifier and

Data-generating Theta in Conditions with 10 items .......................................... 213

6.2. Classification Rate Differences between Random Forest with ROC Classifier and

Data-generating Theta in Conditions with 30 items ......................................... 214

xiii

Table Page

6.3. Sensitivity Differences between Random Forest with ROC Classifier and Data-

generating Theta in Conditions with 10 items ................................................... 215

6.4. Sensitivity Differences between Random Forest with ROC Classifier and Data-


6.5. Specificity Differences between Random Forest with ROC Classifier and Data-


6.6. Specificity Differences Between Random Forest with ROC Classifier and Data-


6.7. Classification Rate Differences between Logistic Regression with ROC Classifier

and Data-generating Theta in Conditions with 10 items ................................... 219

6.8. Classification Rate Differences between Logistic Regression with ROC Classifier

and Data-generating Theta in Conditions with 30 items ................................... 220

6.9. Sensitivity Differences between Logistic Regression with ROC Classifier and


6.10. Sensitivity Differences between Logistic Regression with ROC Classifier and


6.11. Specificity Differences between Logistic Regression with ROC Classifier and


6.12. Specificity Differences between Logistic Regression with ROC Classifier and


6.13. Classification Rate Difference between Lasso with a ROC Classifier and Data-

generating Theta in Conditions with Diagnostic-test Correlation of .70 ......... 225

xiv

Table Page

6.14. Sensitivity Difference between Lasso with a ROC Classifier and Data-generating

Theta in Conditions with Diagnostic-test Correlation of .70 ........................... 226

6.15. Specificity Difference between Lasso with a ROC Classifier and Data-generating

Theta in Conditions with Diagnostic-test Correlation of .70 ............................227

6.16. Classification Rate Difference between Relaxed Lasso with a ROC Classifier and

Data-generating Theta in Conditions with Diagnostic-test Correlation

of .70 ................................................................................................................. 228

6.17. Sensitivity Difference between Relaxed Lasso with a ROC Classifier and Data-


6.18. Specificity Difference between Relaxed Lasso with a ROC Classifier and Data-


7.1. Number of Replications in Conditions with Binary Items where CART did not

Choose at least Two Items (where Theta Score was not Estimated) ................ 231

7.2. Number of Replications in Conditions with Polytomous Items where CART did not


7.3. Correlation between True and Estimated Theta from CART in Conditions with

Binary Items ...................................................................................................... 233

7.4. Correlation between True and Estimated Theta from CART in Conditions with

Polytomous Items ............................................................................................. 234

7.5. Mean Squared Error of Estimated Theta from CART in Conditions with Binary

Items ................................................................................................................. 235

xv

Table Page

7.6. Mean Squared Error of Estimated Theta from CART in Conditions with Binary

Items ................................................................................................................. 236

7.7. Number of Replications in Conditions with Binary Items where Lasso did not


7.8. Number of Replications in Conditions with Polytomous Items where Lasso did not


7.9. Correlation between True and Estimated Theta from Lasso in Conditions with


7.10. Mean Squared Error of Estimated Theta from Lasso in Conditions with


7.11. Mean Squared Error of Estimated Theta from Random Forest in Conditions with

Binary Items ..................................................................................................... 241

7.12. Mean Squared Error of Estimated Theta from Random Forest in Conditions with

Polytomous Items ..............................................................................................242

7.13. Correlation between True and Estimated Theta from Random Forest in Conditions

with Binary Items .............................................................................................. 243

7.14. Correlation between True and Estimated Theta from Random Forest in Conditions

with Polytomous Items ...................................................................................... 244

7.A. Average Number of Items Chosen by CART in Conditions with 10 items ........ 245

7.B. Average Number of Items Chosen by CART in Conditions with 30 items ........ 246

7.C. Average Number of Items Chosen by Lasso in Conditions with 10 items ......... 247

7.D. Average Number of Items Chosen by Lasso in Conditions with 30 items ......... 248

xvi

LIST OF FIGURES

Figure Page

1. Variance explained and unconditional η2 effect sizes for the predictors of

classification accuracy in machine learning models with ROC classifiers .........69

2. Random Forest variable importance measures for machine learning algorithms with

ROC classifiers .....................................................................................................70

2.1. Variable Importance Measures for Data-generating Model ................................ 249

2.2. Variable Importance Measures of the Raw Summed Scores ............................... 250

2.3. Variable Importance Measures of Estimated Theta ............................................. 251

4.1. Regression Tree to predict CART Classification Rate ......................................... 252

4.2. Regression Tree to predict CART sensitivity ...................................................... 253

4.3. Regression Tree to predict CART specificity ...................................................... 254

4.4. Variable Importance Measures of CART ............................................................. 255

4.5. Regression Tree to predict Random Forest Classification Rate .......................... 256

4.6. Regression Tree to predict Random Forest Specificity ....................................... 257

4.7. Variable Importance Measures of Random Forest with Bayes Classifier ........... 258

4.8. Regression Tree to predict Lasso Logistic Regression Sensitivity ...................... 259

4.9. Variable Importance Measures of Lasso Logistic Regression with Bayes

Classifier ........................................................................................................... 260

4.10. Regression Tree to predict Relaxed Lasso Sensitivity ........................................ 261

4.11. Regression Tree to predict Relaxed Lasso Specificity ....................................... 262

4.12. Variable Importance Measures of Relaxed Lasso Logistic Regression with Bayes

Classifier ........................................................................................................... 263

xvii

Figure Page

4.13. Regression Tree to predict Logistic Regression Classification Rates ............... 264

4.14. Regression Tree to predict Logistic Regression Sensitivity .............................. 265

4.15. Regression Tree to predict Logistic Regression Specificity ............................... 266

4.16. Variable Importance Measures of Logistic Regression with Bayes Classifier .. 267

4.17. Regression Tree to predict Sensitivity of Random Forest with a ROC

Classifier ........................................................................................................... 268

4.18. Variable Importance Measures of Random Forest with ROC Classifier ........... 269

4.19. Regression Tree to predict Sensitivity for Logistic Regression with a ROC

Classifier ........................................................................................................... 270

4.20. Variable Importance Measures for Logistic Regression with ROC Classifier .. 271

4.21. Regression Tree to predict Classification Rates for Lasso Logistic Regression

with ROC Classifier ........................................................................................ 272

4.22. Regression Tree to predict Specificity for Lasso Logistic Regression with ROC

Classifier ........................................................................................................... 273

4.23. Variable Importance Measures of Lasso Logistic Regression with ROC

Classifier ........................................................................................................... 274

4.24. Regression Tree to predict Classification Rate of Relaxed Lasso with ROC

Classifier ........................................................................................................... 275

4.25. Regression Tree to predict Sensitivity of Relaxed Lasso with ROC Classifier ...276

4.26. Regression Tree to predict Specificity of Relaxed Lasso with ROC Classifier ..277

4.27. Variable Importance Measures of Relaxed Lasso with ROC Classifier ............ 278

7.1. Regression Tree for Theta MSE of CART .......................................................... 279

xviii

Figure Page

7.2. Regression Tree for the Correlation between True and Data-generating Theta for

CART ............................................................................................................... 280

7.3. Variable Importance Measures for CART Person Parameter Recovery ............. 281

7.4. Regression Tree for Theta MSE in the Lasso Logistic Regression Model .......... 282

7.5. Regression Tree for the Correlation between Estimated and Data-generating Theta

for Lasso Logistic Regression .......................................................................... 283

7.6. Variable Importance Measures for the Lasso Logistic Regression Person Parameter

Recovery ........................................................................................................... 284

7.7. Regression Tree for Theta MSE in the Random Forest Model ........................... 285

7.8. Regression Tree for the Correlation between Estimated and True Theta for Random

Forest Model ..................................................................................................... 286

7.9. Variable Importance Measures for the Random Forest person parameter

recovery ............................................................................................................ 287

1

1. Introduction

An important property of psychological assessments and medical instruments is

the ability to accurately screen or diagnose individuals for disorders or illnesses. For

example, a psychiatrist might assess if a patient has major depressive disorder, or a

physician might diagnose a patient with cancer, or a psychologist might evaluate if a

teenager is at risk of suicide. Other examples include the Child Behavior Checklist

(CBCL), a measure of emotional, behavior and social problems in children, used to

screen children for autism spectrum disorders (Achenbach & Rescorla, 2013); the Center

for Epidemiologic Studies Depression (CES-D) Scale, a measure of depressive

symptoms, used to screen for depression in older adults (Lewinsohn, Seeley, Roberts, &

Allen, 1997); and the Parent General Behavior Inventory (P-GBI), a behavior inventory,

used to screen children for pediatric bipolar disorder (Youngstrom, Frazier, Demeter,

Calabrese, & Findling, 2009). In all of these situations, the assessment reduces to making

a binary decision, such as deciding to follow-up with a patient (or not), or to make a

diagnosis (or not; Liu, 2012). Therefore, the assessment needs to maximize the

proportion of patients correctly identified by the assessment as having the diagnosis

(known as sensitivity), and the proportion of patients correctly identified by the

assessment as not having the diagnosis (known as specificity). There are approaches in

both psychometrics and machine learning to maximize diagnostic classification accuracy

(Gibbons, Weiss, Franke, & Kupfer, 2016; James et al., 2013; Liu, 2012; Youngstrom,

2013; Zweig & Campbell, 1993). However, it is less clear: (1) which approach to use and

(2) how the approaches compare to one another in the same context. A simulation study

2

could provide relative advantages and disadvantages of using psychometric and machine

learning methods for diagnostic classification.

The proposed simulation investigates the performance of determining a diagnosis

from an assessment using psychometric and machine learning approaches. The

psychometric methods used in this study aggregate item information (as summed scores

or item response theory scale scores), and then a cut score is determined by using receiver

operating characteristic (ROC) curves. In contrast, machine learning methods treat

diagnostic classification as a prediction problem, where the predictors of the diagnosis

are the responses to the items comprising the assessment. The machine learning methods

then assign each case either to the class that is most probable (Bayes classifier) or to a

class conditional on a probability threshold determined by ROC curves (ROC classifier).

The proposed simulation consists of two studies. The first study compares

classification accuracy using the psychometric and machine learning methods. The main

outcomes of the first part of the study are the difference in classification accuracy,

sensitivity, and specificity across the methods. The goal of the first study is to understand

how classification performance across the methods is influenced by sample size,

prevalence of the diagnosis, the diagnosis-test correlation, and assessment structure. The

second study investigates the variable selection property of machine learning models and

if items selected for prediction could recover the latent variable score on the assessment.

The goal of the second study is to compare the correlation between the IRT scores

estimated by the items selected by machine learning methods to the data-generating IRT

scores. In this case, ceiling and floor effects are of interest.

3

This dissertation has six parts. First, basic diagnostic classification studies and

classification accuracy measures are introduced. Second, psychometric approaches to

classification using IRT and ROC curve analysis are described. Third, machine learning

approaches to classification using Lasso logistic regression, classification and regression

trees, and the random forest algorithm are described. Fourth, study goals and the

simulation methods are discussed, Fifth, Monte Carlo simulation results are presented.

Finally, the strengths and limitations of psychometric and machine learning models are

discussed, and future directions are considered.

1.1 Diagnostic Assessment and Classification Accuracy

In the simplest case, a diagnosis is a binary outcome, and the diagnostic

assessment determines if the patient is likely to be diagnosed or not diagnosed. Typically,

the diagnostic assessment can be composed of a series of items, and a cut score has to be

determined so that clinicians or researchers determine the likelihood of a person having

the diagnosis. Diagnostic assessments are imperfect, but they correlate with a gold

standard that dictates the diagnosis. Gold standards tend to be costly, extensive, invasive,

or all of these, so administering a gold standard may be inconvenient. Diagnostic

accuracy studies are research studies that investigate how well diagnostic assessments

discriminate between patients with and without a condition (Zhou, Obuchowski, &

McClish, 2011; Zweig & Campbell, 1993). Diagnostic accuracy studies are usually

composed of three parts: (1) a sample of participants who take the diagnostic assessment,

(2) an approach to score the diagnostic assessment, and (3) a gold standard, independent

of the diagnostic assessment, that indicates the true condition (Zhou et al., 2011). The

proposed simulation study mimics the scenario of a diagnostic accuracy study.

4

For example, consider a diagnostic accuracy study on how the Revised Hamilton

Scale for Depression (HRSD-R) discriminates between clinically depressed and non-

depressed participants (McFall & Treat, 1999). The gold standard latent depression

scores 𝜃𝐷𝑖𝑎𝑔 (clinical interview or a more thorough assessment) are presented in the

horizontal axis of Figure A. A cut score on 𝜃𝐷𝑖𝑎𝑔 (dashed vertical line in Figure A) has

been determined to classify participants as depressed (right of the vertical dashed line) or

not depressed (left of the vertical dashed line). Suppose that 𝜃𝑑𝑖𝑎𝑔 correlates highly, but

not perfectly, with scores from the cheaper and less invasive HRSD-R. The vertical axis

represents the estimated scores of the HRSD-R, 𝜃𝐴𝑠𝑠𝑚𝑡 . Researchers would then

investigate the optimal HRSD-R cut score (dashed horizontal line in Figure A) to

maximize correct classification of participants as depressed or not depressed. The HRSD-

R is not perfect, so it may misclassify participants as depressed or not depressed. For

example, decreasing the HRSD-R cut score (sliding down the horizontal dashed line in

Figure A) would correctly identify most participant who are depressed (high sensitivity),

but would also identify many non-clinically depressed participants as depressed (low

specificity). On the other hand, increasing the HRSD-R cut score (sliding up the

horizontal dashed line in Figure A) would correctly rule out a depression diagnosis for

Figure A. Relationship between diagnosis and the assessment

FP

FN

TP

TN

5

most non-depressed participants (high specificity), but would miss many participants who

are depressed (low sensitivity). A common strategy is to study the trade-off between

correctly classifying participants as depressed or not (using ROC curves, introduced

later), but determining a cut score is a subjective decision. It is important to note that the

goal of the HRSD-R is not meant to replace the gold standard, but to provide an idea of

who should be followed-up in primary care settings or in situations where screening

needs to be done for a large number of people.

1.1.1 Classification Accuracy

A confusion table describes the relationship between the diagnosis by the gold

standard and predicted diagnosis by the assessment. Below is the general form of a

confusion table (Table A) in terms of diagnosed and not diagnosed, but the binary

variable can take any class or label.

Table A. Confusion table for diagnostic classification

Predicted

Not Diagnosed

Predicted

Diagnosed

Prediction Error

True Not Diagnosed A B B/(A+B)

True Diagnosed C D C/(C+D)

Use Error C/(A+C) B/(B+D) Error = 𝐵+𝐶

𝐴+𝐵+𝐶+𝐷

Using Table A and Figure A, D is the number of diagnosed patients that were correctly

predicted by the assessment (true positives; TP). A is the number of non-diagnosed

patients that were correctly predicted by the assessment (true negatives; TN). B is the

number of non-diagnosed patients that were incorrectly predicted by the assessment

(false positives; FP). Finally, C is the number of diagnosed patients that were incorrectly

predicted by the assessment (false negatives; FN). The main diagonal of Table A contains

6

the number of cases correctly predicted by the assessment, and the off-diagonal of Table

A contains the number of cases incorrectly predicted by the assessment.

Below are some statistics from the confusion table that estimate model accuracy

and that are used in this study. Apparent prevalence AP is the proportion of diagnosed

participants in the sample, 𝐵+𝐷

𝐴+𝐵+𝐶+𝐷. Sensitivity (Se) is the proportion of diagnosed

participants who were correctly predicted as diagnosed by the assessment, D/(C+D), and

specificity (Sp) is the proportion of non-diagnosed participants who were correctly

predicted as non-diagnosed by the assessment, A/(A+B). The complement of sensitivity

is the false negative rate (FNR), which is the proportion of diagnosed participants who

were predicted as non-diagnosed by the assessment, C/(C+D) or 1-Se. The complement

of specificity is the false positive rate (FPR), which is the proportion of non-diagnosed

participants who were predicted as diagnosed by the assessment, B/(A+B) or 1-Sp. With

estimates of Se and Sp, one can estimate true prevalence in the population P, 𝐴𝑃+𝑆𝑝−1

𝑆𝑝+𝑆𝑒+1 . Se

and Sp are independent of the prevalence. All of these statistics vary as the assessment

cut score changes. Finally, the probability of a correct result is the probability of getting

a true positive or a true negative, 𝐴+𝐷

𝐴+𝐵+𝐶+𝐷. As a complement, the misclassification rate is

the probability of getting a false positive or false negative, 𝐵+𝐶

𝐴+𝐵+𝐶+𝐷. Both of these

statistics depend on prevalence P. Two limitations of the probability of a correct result

are that: 1) false positives and false negatives are weighted the same and 2) assessments

with different sensitivities and specificities can give the same estimate, preventing direct

comparisons.

1.1.2 Some Clarifications

7

In this study, the term theta or θ refers to an IRT score. Also, the term diagnostic

assessment refers to making a binary decision about a diagnosis, and false positives and

false negatives are weighted the same. In other research, diagnostic assessment refers

only to assessments that focus on high specificity (Gibbons et al., 2013). This study also

focuses on methods for binary diagnoses, although there are models for diagnoses with

more categories or severities (Zhou et al., 2011). Finally, this study assumes that the gold

standard is reliable, unbiased, and that it yields the true diagnosis. However, see Zhou et

al. (2011) for cases when there are problems with the gold standard.

1.2. Psychometric Methods for Diagnostic Assessment

Psychometric methods for diagnostic assessment focus on improving the

measurement of the construct targeted in the assessment so that precise latent variable

scores can be estimated and misclassifications decrease. Typically, diagnostic assessment

would estimate a sum score and conduct ROC curve analysis to find a cut score on the

assessment. A modern psychometric approach for diagnostic assessment would consist

of: 1) fitting an IRT model to the assessment items, 2) estimating IRT scores, and 3)

conducting an ROC curve analysis to study how sensitivity and specificity change as a

function of shifting the assessment cut score. This section introduces IRT and ROC curve

analysis.

1.2.1 Item Response Theory

Item response theory (IRT) refers to a family of mathematical models used to

analyze item responses measuring a latent construct. IRT underlies many of the large-

scale test developments and scale construction (Thissen & Wainer, 2001). In contrast to

classical test theory, the most important properties of IRT are that item properties and

8

person scores on the intended construct are on the same scale, and that assessment takers

do not need to take the same items to have comparable scores.

IRT Models. A fundamental concept of IRT is the item characteristic curve (ICC)

or trace line. The trace line describes the probability of endorsing an item, or getting an

item correct, as a function of a participant’s level on the latent construct (θ) and

properties of the items. The two IRT models considered in this study are models for

dichotomous responses and polytomous (ordered-categorical) responses. The two-

parameter logistic model (2PL; Birnbaum, 1968) describes the trace lines for

dichotomous items using a logistic (S-shaped) curve with two item parameters,

𝑃(𝑢𝑖 = 1|𝜃, 𝑎𝑖, 𝑏𝑖) =1

1 + exp [−𝑎𝑖(𝜃 − 𝑏𝑖)] (2.1)

where θ is the latent variable score, 𝑎𝑖 is the discrimination parameter for item i, and 𝑏𝑖 is

the severity parameter for item i. The θ is a person parameter because it describes where

the participant stands on the intended construct, and a and b are item parameters because

they describe item properties. The severity parameter b is the item location along the

range of θ where a person has a .50 probability of endorsing the item (see Figure B). The

b parameters typically vary from -2 to 2 and are on the same scale as θ. Items with lower

b values are easier to endorse than items with higher b values. The discrimination

parameter a is proportional to the slope of the curve at the value of the severity

parameter. Items with higher a values can separate participants close to the b value better

than items with lower a values. The a parameters usually vary between 0 and 3. A

negative discrimination value indicates that the item has a different valence than the

majority of the items and it might need to be reverse-coded.

9

On the other hand, the graded response model (GRM; Samejima, 1969) is used to

describe trace lines for polytomous items,

𝑃(𝑢𝑖 = 𝑘|𝜃, 𝑏𝑖𝑘, 𝑎𝑖) =1

1 + exp[−𝑎𝑖(𝜃 − 𝑏𝑖𝑘)]−

1

1 + exp[−𝑎𝑖(𝜃 − 𝑏𝑖𝑘+1)] . (2.2)

The GRM uses cumulative probability differences to obtain the probability of answering

category k on the item. Specifically, the GRM compares the probability of responding up

to k category and the probability of responding to successive k+1 categories. For

example, if there are four k responses {0, 1, 2, 3}, comparisons are {0} vs. {1,2,3}, {0,1}

vs. {2,3}, and {0,1,2} vs. {3}. So, there are k-1 b-parameters for an item with k

categories. By definition, the probability of responding to the lowest category {0} or

higher is 1, and the probability of responding higher than the highest category {3} is 0.

Therefore, taking the difference between the probability of answering up to k and k+1

provides an estimate of the probability of responding k to the item. In this case, a is the

discrimination parameter, and b is the category boundary location and is related to the

level of θ where answering k is most likely (Edelen & Reeve, 2007).

The IRT models just described assume unidimensionality, where a single latent

variable explains the relationships among the items. Closely related to unidimensionality,

Figure B. IRT trace line

10

IRT models also assume local independence, where the joint probability of endorsing two

items is the product of the probabilities of endorsing each of the items (or, that items are

not related to each other conditional on θ). Violations of local independence (or local

dependence, LD) could be due to unmodeled dimensionality or content/location

similarity between items. Violations of unidimensionality have been previously addressed

using either bifactor or multidimensional item response theory (MIRT) models (Reckase,

2009). The impact of LD can be investigated by either analyzing if the model is

essentially unidimensional using measures of factor strength or local dependence

statistics. The rest of the study assumes that the assessment is unidimensional and, in

most conditions, that the items are locally independent.

Information from IRT Models. Item information functions (IIFs) are derived

from item parameters to indicate where each item discriminates the most across the range

of θ, or where each item can be most useful in estimating the θ score precisely (see

Figure C). The IIF’s are additive, so all of the items in the measure can define a test

information function (TIF) to describe where the measure discriminates the most across

the range of θ. For example, an assessment can provide more precise θ scores for

participants close to one standard deviation above the θ mean than in any other range.

Thus, IIFs and TIFs are used in scale development to select items so that assessments

have specific measurement properties.

11

IRT Calibration. In this study, marginal maximum likelihood (MML) with the

EM algorithm (Bock & Aitkin, 1981) is used to estimate item parameters for

dichotomous and polytomous IRT models. The item parameter estimation step is known

as calibration. The probability of answering response vector x of size I for person n is,

𝑝(𝒙𝒏|𝜃𝑛, 𝛽) = ∏ Pr (Xni = 𝑥𝑛𝑖|𝜃𝑛, 𝛽𝑖)

𝐼

𝑖=1

(2.5)

In this case, Xni is a response for person n on item i, β is a vector of item parameters of

the IRT model, and θn is the person parameter. Therefore, the probability of answering

each response vector depends on item parameters and person parameter. The likelihood

function is,

𝐿(𝛽, 𝜃: 𝒙) = ∏ 𝑝(𝒙𝒏|𝜃𝑛, 𝛽)

𝑁

𝑛=1

(2.6)

There are consistent estimation problems when maximizing the likelihood function in Eq.

2.6 because the estimated θ parameters increase as sample size increases. To circumvent

this problem, MML-EM assumes that item parameters are fixed and that persons are

randomly sampled from a population. Therefore, the θ parameters are treated as random

effects and are integrated (marginalized) over during estimation (de Ayala, 2009).

Formally,

Figure C. IRT trace line and information function

12

𝑝(𝒙|𝛽) = ∫ 𝑝(𝑥|𝜃, 𝛽)𝑔(𝜃)𝑑𝜃

∞

−∞

(2.7)

where g(θ) is the distribution of the θ’s. The distribution g(θ) is assumed to be normally

distributed. By marginalization, x would only depend on the item parameters. The EM

algorithm is then used to find the maximum of the likelihood function. The EM algorithm

is iterative, where it estimates the expected number of examinees who answer response k

to item i (E-step), and finding the item parameters that maximize the likelihood function

of observing those numbers in the E-step (M-step; Yen & Fitzpatrick, 2006). The process

continues until some convergence criterion is satisfied. Estimation may be complicated

when the likelihood function is complex or when different item parameters lead to similar

model-implied probabilities. In these cases, Bayesian methods could be used for item

calibration (Levy & Mislevy, 2016), but they are not discussed in this study.

IRT Scoring. After items are calibrated, the item parameters are treated as fixed

and response patterns can be scored to estimate θ (van der Linden & Pashley, 2010).

Three IRT scoring methods are maximum likelihood (ML) scoring, maximum a

posteriori (MAP) scoring, or expected a posteriori (EAP) scoring. The ML score estimate

is the mode of the likelihood function with respect to θ (the product of the trace lines; Eq.

2.6) and reflects the range of θ where a participant’s response pattern is most likely

(Thissen & Orlando, 2001). The mode of the likelihood is found by iterative methods,

such as the Newton-Raphson procedure. Items with higher discrimination parameters

have a greater impact on the likelihood, thus a greater impact on the estimated θ score.

Score precision can be quantified by the ML standard error, defined by the second partial

derivative of the loglikelihood (inverse of the information function from Eq. 2.3 or 2.4)

13

and describes the spread of the likelihood function. A limitation of the ML scoring is that

the maximum for the likelihood function is not defined for all-correct or all-incorrect

response patterns – the participant has a higher or lower θ than what the items measure.

Therefore, other scoring methods need to be considered.

Both expected a posteriori (EAP) and maximum a posteriori (MAP) scoring

overcome the ML scoring limitations by including prior information (a prior distribution)

of θ. The product of the likelihood of each of the participant’s responses and the prior

distribution define a joint likelihood (or posterior distribution) that provides finite θ

estimates. The MAP[θ] estimate is the mode of that joint likelihood, also estimated using

iterative methods.

On the other hand, EAP scoring uses the mean of the joint likelihood as a score

estimate and is defined by the ratio of two integrals,

𝐸𝐴𝑃[𝜃] =∫ ∏ 𝑝𝑖(𝑢𝑖|𝜃)ϕ(𝜃)𝜃 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠

𝑖=1∞

−∞

∫ ∏ 𝑝𝑖(𝑢𝑖 |𝜃)ϕ(𝜃) 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠𝑖=1

∞−∞

. (2.8)

The numerator is the weighted sum of the θ by the posterior and the denominator is the

sum of those ratios (Thissen & Orlando, 2001). Uncertainty in the EAP[θ] estimate is

quantified by the standard deviation of the posterior,

𝑆𝐷[𝜃] = (∫ ∏ 𝑝𝑖(𝑢𝑖|𝜃)ϕ(𝜃)(𝜃−𝐸𝐴𝑃[𝜃])2 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠

𝑖=1∞

−∞

∫ ∏ 𝑝𝑖(𝑢𝑖|𝜃)ϕ(𝜃) 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠𝑖=1

∞−∞

)1/2

. (2.9)

The integrals for EAP[θ] and SD[θ] can be approximated by q quadrature points

along the range of θ. EAP [θ] is the most common method to derive θ scores (Thissen &

Steinberg, 2009), so they are used in this study. EAP[θ] estimates are considered

shrunken estimates because they pull θ towards the mean of the prior and have a smaller

variance than the likelihood. Previous research suggests that EAP[θ] scores could prevent

14

poor score estimation when the likelihood is bimodal and are more robust to IRT model

misspecifications than other scoring methods (Thissen & Wainer, 2001).

Overall, IRT provides a framework to calibrate item parameters and score item

responses to estimate θ. IRT scores have the property that they can be compared across

participants that have not taken the same items. This property is useful for study 2,

discussed below.

1.2.2 Receiver Operating Characteristic Curves

Receiver operating characteristic (ROC) curve analysis provides a valuable tool to

evaluate classification accuracy of a diagnostic assessment or a statistical model (Egan,

1975; Pepe, 2003; Zweig & Campbell, 1993). ROC analysis, rooted in signal detection

theory, was developed during World War II to investigate electronic signal detection in

radars, but the methodology was quickly adapted to study classification accuracy in the

areas of medicine, psychology, psychophysics, epidemiology, and radiology (McFall &

Treat, 1999; Zou, O’Malley, & Mauri, 2007). ROC curves describe the accuracy of a

diagnostic assessment by obtaining sensitivity and specificity information along cut

scores of the diagnostic assessment. An ROC curve is presented in Figure D, where

sensitivity is plotted on the y-axis and the false positive rate (1 – specificity) on the x-axis

for each cut score (defined by the unique values of the assessment). In other words, a 2x2

confusion table (as in Table A) is computed for each decision point, and sensitivity and

specificity are estimated and plotted. The empirical (non-parametric) ROC curve is then

estimated by connecting the sensitivity and specificity coordinates, describing the trade-

off between those two statistics. The empirical ROC curve goes from the lower left

corner, where the sensitivity and the false positive rate are 0 (maximally conservative

15

cutoff), to the upper right corner where the sensitivity and false positive rates are 1

(maximally liberal cutoff; McFall & Treat, 1999). The shape and height of the ROC

curve indicate how well the diagnostic assessment discriminates. A 45-degree line (the

chance diagonal) can be used as a reference because it represents an assessment that

makes diagnoses at random (the true positive rate and the false positive rate are the

same). The estimation of the empirical ROC curve does not involve distributional

assumptions, so the curve is jagged and in staircase shape. Smooth ROC curves can be

estimated by making distributional assumptions, such as a binormal model where both

distributions of scores are assumed to be normal. The relationship of the two curves

under the binormal model reduce to two numbers, the standardized difference in the

means between the distributions and the ratio of their standard deviations (Zhou et al.,

2011). This study uses empirical ROC curves, so distributional assumptions are not

discussed (see McFall & Treat, 1999). Overall, ROC curves allow researchers to easily

operate along the curve in how they want to make decisions based on the assessment.

Statistics from the ROC curve. A one-number summary statistic for diagnostic

accuracy is the area under the ROC curve (AUC). The AUC indicates the likelihood of

making a correct classification when two cases (one per group) are chosen at random, or

Figure D. ROC curve 100 - Specificity (%)

Se

nsitiv

ity (

%)

0 20 40 60 80 100

02

04

06

080

10

0

16

also interpreted as the average sensitivity (specificity) across all values of specificity

(sensitivity). An AUC of .50 (which is the AUC of the 45-degree line) indicates that

assessment decisions are made at random, and AUC’s between .5 and 1.0 indicate that

the assessment performs better than chance. It is difficult to provide minimal effect sizes

for AUC values because two assessments may have the same AUC, but are able to

discriminate in different clinically-relevant regions. Measures of partial AUC, estimates

of sensitivity or specificity at fixed values of the other, can be estimated for

predetermined areas of interest, but these are not used in this study.

There are three different ROC curve indices that can be used to determine a cut

score (Liu, 2012). The closest-to-(0,1) criterion chooses the minimum Euclidian distance

between the ROC cut scores and the ideal (0,1) point of the graph (perfect sensitivity and

perfect specificity). The closest-to-(0,1) criterion is,

𝐶𝑙𝑜𝑠𝑒𝑠𝑡 − 𝑡𝑜 − (0,1): 𝑀𝑖𝑛 {√[1 − 𝑆𝑝]2 + [1 − 𝑆𝑒]2} . (2.10)

Furthermore, a cut score can be determined by maximizing the Youden index, which is

the sum of specificity and sensitivity minus one,

𝑌𝑜𝑢𝑑𝑒𝑛 𝑖𝑛𝑑𝑒𝑥: 𝑀𝑎𝑥 {𝑆𝑒 + 𝑆𝑝 − 1}. (2.11)

Graphically, the Youden index is the point in the ROC curve that is farthest away from

the chance diagonal. Finally, a cut score can be determined by the concordance

probability, which is the product of sensitivity and specificity,

𝐶𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑐𝑒 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦: 𝑀𝑎𝑥{𝑆𝑒 ∗ 𝑆𝑝}. (2.12)

Graphically, the concordance probability is the area of a rectangle under the ROC curve

given a specific cut score. The width of the rectangle is the specificity and the height of

17

the rectangle is the sensitivity. These three ROC curve statistics are estimated in this

study.

The ROC curve has several important properties and makes several assumptions.

The ROC curve is scale invariant to monotonic transformations, so the ROC curve only

depends on the ranks of the assessment results. Furthermore, ROC curves can be

estimated for each new assessment and can be visually compared, instead of just

comparing sensitivity and specificity of just one threshold per assessment. ROC curves

are also invariant to the prevalence of the condition and changes in the cut scores (that is,

all cut scores are represented). One assumption is that the scores of the diagnostic

assessment can be ordered in magnitude so that sensitivity and specificity can be

determined by shifting the threshold. Furthermore, it is assumed that the ROC curve is

the same for all subsamples of the dataset, and that the researchers have access to the true

diagnosis. Finally, researchers might want to weigh true positives, false positives, true

negatives, and false negatives differently depending on costs and risks of the clinical

decision. There are utility functions to differentially weight these outcomes, but this study

assumes that true positives and true negatives should be equally weighted.

1.2.3 Summary

The psychometric approach to diagnostic classification is to fit an IRT model to

the assessment, estimate IRT θ scores, and use ROC analysis to determine the θ cut score

that maximizes sensitivity and specificity. As in any IRT model, the score precision

depends on meeting model assumptions. This section discussed IRT models, parameter

estimation, and score estimation. Also, this section discussed background on ROC curve

18

analysis with accuracy statistics to decide cut scores. Next, machine learning approaches

to classification are discussed.

1.3. Machine Learning Approaches to Diagnostic Assessment

Researchers aim to identify important predictors of medical or psychological

phenomena, such as suicidal behavior, depression, or vocabulary development.

Parametric models, such as linear or logistic regression, have been developed to identify

important predictors, but they suffer from several limitations (Strobl, Malley, & Tutz,

2009). Researchers might not be able to use parametric models when the number of

predictors exceed the number of cases (without using dimension reduction techniques) or

interpret complex models with many predictors or many higher-order interactions when

predicting an outcome. Using a parametric model also presumes that one knows the

correct functional form of the relation modeled (e.g., a linear relation between predictors

and outcome). Finally, current approaches to inferential statistics may find an optimal

solution to the dataset, but the solution may not generalize to other datasets (an

overfitting problem). Machine learning provides a series of exploratory statistical models

that focus on prediction, instead of inference. The aim of machine learning models is to

predict the patterns found in a dataset in an independent sample. Two machine learning

models that overcome the previous limitations are Lasso logistic regression and CART,

along with their extensions.

1.3.1 The Lasso for Logistic Regression

The Lasso for logistic regression is a variable selection method to choose the most

important predictors of a binary response. The section below introduces the Lasso for

19

OLS regression, discusses logistic regression, combines both Lasso and logistic

regression, and discusses an extension of Lasso logistic regression, the relaxed Lasso.

Lasso. The Lasso is a machine learning extension of ordinary least squares (OLS)

regression used to improve prediction and interpretability of the regression model

(Tibshirani, 1996). Coefficient estimates from OLS regression are unbiased when the

number of cases exceed the number of predictors, but prediction based on the model

tends to vary when the model is fit to another dataset. In other words, the OLS

coefficients may change drastically if the dataset is changed slightly. Also, OLS

regression might lead to a complex solution when there are many relevant and irrelevant

predictors in the model. The Lasso involves fitting a model with all of the predictors, but

adds a penalty to the OLS loss function so that the regression coefficients are shrunken

towards zero or to exactly zero. Shrinking (or regularizing) the regression coefficients

would induce some bias in the model solution (coefficients may not be optimal for that

specific dataset), but prediction would be less variable when those coefficients are used in

other datasets. Also, the Lasso could be used for variable selection because regression

coefficients might be estimated to be exactly zero, therefore some of the predictors would

drop from the final model. Therefore, the Lasso might provide some benefits to select the

best items from an assessment and give them appropriate weights to predict a diagnosis.

The loss function from the Lasso is similar to the OLS loss function, except that

the Lasso loss function adds a penalty to the size of the coefficients, such as,

𝐿𝑎𝑠𝑠𝑜 = ∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑗𝑥𝑖𝑗

𝑝

𝑗=1

)

2𝑛

𝑖=1

+ 𝜆 ∑|𝛽𝑗|

𝑝

𝑗=1

= 𝑅𝑆𝑆 + 𝜆 ∑ |𝛽𝑗|

𝑝

𝑗=1

. (3.1)

20

The first part of the loss function is the residual sums of squares from the OLS loss

function, and the second part is the shrinkage penalty that controls for the absolute size of

the Lasso coefficients. The impact of the penalty term is controlled by the nonnegative

tuning parameter λ that is determined by k-fold cross-validation so that λ yields the

smallest prediction error (as in the mean squared error) in the left-out fold. If λ is zero,

then the penalty parameter would not have any impact and one would get the OLS

coefficients, but the impact of the penalty increases as λ approaches ∞ and the

coefficients are shrunken or estimated to be zero. As a result, the Lasso coefficients may

be more interpretable for models with many predictors. It is important to note that it is

best to standardize the predictors before using the Lasso because coefficient size is

dependent on the scale of the predictors.

Logistic regression. In diagnostic classification, consider predicting the

probability of a binary diagnosis as a function of predictors,

𝑝(𝑌 = 1| 𝑿) = 𝑏𝑜 + 𝑏1𝑋1 + 𝑏2𝑋2. (3.2)

Probability is bounded by 0 and 1, so it might not be appropriate to use OLS regression

because a linear function might make predictions outside of that bound. One can turn to

an S-shaped, logistic function bounded by 0 and 1. The logistic function is,

𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑝(𝑌 = 1| 𝑿) =𝑒𝑏𝑜+𝑏1𝑋1+𝑏2𝑋2

1 + 𝑒𝑏𝑜+𝑏1𝑋1+𝑏2𝑋2 . (3.3)

After manipulation, there is a linear relationship between the natural logarithm of the

odds (𝑝(𝑌 = 1| 𝑋)

1−𝑝(𝑌 = 1|𝑋)) and the predictors,

ln (𝑝(𝑌 = 1| 𝑋)

1 − 𝑝(𝑌 = 1|𝑋)) = 𝑏𝑜 + 𝑏1𝑋1 + 𝑏2𝑋2. (3.4)

21

The regression coefficients indicate that for a one-unit change in X, there is a one-unit

change in the logit (log of the odds unit) of the outcome. Furthermore, the regression

coefficients are estimated using maximum likelihood. Below is the loglikelihood

equation of logistic regression,

𝑙(𝛽) = ∑{𝑦𝑖𝛽𝑇𝑥𝑖 − log(1 − 𝑒𝛽𝑇𝑥𝑖)}

𝑁

𝑖=1

. (3.5)

The coefficients βT can then be used to estimate predicted probabilities (from Eq. 3.4),

and in turn predicted categories of the outcome.

Lasso logistic regression. Just as in any linear regression, the Lasso penalty can

be extended to logistic regression. Below is the penalized version of the loglikelihood for

logistic regression,

𝑙(𝛽)𝐿1 = ∑{𝑦𝑖𝛽𝑇𝑥𝑖 − log(1 − 𝑒𝛽𝑇𝑥𝑖)}

𝑁

𝑖=1

− 𝜆 ∑|𝛽𝑗|

𝑝

𝑗=1

. (3.6)

By maximizing the previous loglikelihood function, researchers can use the Lasso for

variable selection when they are predicting a diagnosis. Similar to Lasso in OLS

regression, the tuning parameter λ is chosen by k-fold cross-validation. With Lasso

logistic regression, the probability of diagnosis could be predicted by items in the

diagnostic assessment, which in turn would derive a model with the most important

predictors and their weights to predict the diagnosis.

Relaxed Lasso logistic regression. The estimates of the non-zero Lasso

coefficients are biased towards zero, and as sample size increases, they do not approach

their true values. To overcome this bias, an unrestricted logistic regression model could

be fitted again only with the variables selected by the Lasso. This approach is known as

22

the relaxed Lasso (Meinshausen, 2007), and this Lasso extension is also carried out in

this study.

1.3.2 Classification and Regression Trees

Classification and regression trees (CART) is a machine learning, nonparametric

regression method that can identify important predictors of an outcome by exploring and

partitioning the complete predictor space into regions where values of the outcome are

most similar to each other (Breiman, Friedman, Olshen, & Stone, 1984). Trees are useful

and easy to interpret, they do not depend on a functional form, and they can probe higher-

order predictor interactions. Recently, the CART algorithm has been proposed as a

method to adaptively select items from a longer scale to predict a diagnosis (Gibbons et

al., 2016; McArdle, 2013). In this section, the CART algorithm is described and its

application to predict a diagnosis is discussed.

The CART algorithm. CART uses binary recursive partitioning for prediction.

In general, CART selects the best binary split among all possible predictor values to

partition the dataset into two regions where the outcome is most similar. Then, the

procedure is carried out recursively in each derived region until the CART algorithm

meets a stopping rule. Splits in the trees are interpreted as interactive effects of predictors

on the outcome. Machine learning is rooted in cross-validation, so the algorithm typically

builds the model in a training dataset and then prediction performance is evaluated in a

validation or holdout dataset. The CART algorithm can be used to predict quantitative

outcomes (regression trees) or qualitative outcomes (classification trees). This discussion

will focus on classification because this study deals with predicting a diagnosis.

23

Classification trees. First, consider the predictor space 𝑋, which is a matrix of p

predictors in the training dataset. Take all possible binary splits across all the s values of

predictor 𝑋𝑗 (there are s-1 splits) and create two regions in the dataset, one for cases

where 𝑋𝑗 < 𝑠 and another for cases where 𝑋𝑗 ≥ 𝑠. In other words, the dataset is split in

two parts, one that meets the first conditional statement based on 𝑋𝑗 and 𝑠 and one that

does not meet the conditional statement. This procedure is done to all p predictors in the

𝑋 predictor space, and the best 𝑠 value on 𝑋𝑗 predictor among all p predictors is chosen to

split the predictor space 𝑋. In this case, the best split is defined as the one that minimizes

the misclassification rate E in each region. The misclassification rate is the proportion of

cases where the observed and predicted value differ,

𝐸 = 1 − max(�̂�𝑚𝑘), (3.7)

where �̂�𝑚𝑘 is the proportion of the observations in the training dataset in the mth region

that belong to the kth category. Once the best split has been decided, the procedure is

carried out again in the defined regions until the algorithm meets a stopping rule, often

based on a maximum number of splits or minimum number of cases in each region. The

predicted value is the mode in the region.

Splitting rules. Splits depend on the predictor type. For a continuous predictor,

candidate splits are considered along all unique values of the predictor. Suppose an item

score is a predictor of the diagnosis, so a proposed partition could be defined by

assessment takers who scored less than 3 on the item and assessment takers who scored 3

or higher. On the other hand, splits on unordered categorical predictors are defined in the

one (or some)-versus-rest form for all combination of predictors. Consider a specific

clinician as a predictor of diagnosis. A proposed partition can be defined by patients who

24

have Johnny as their doctor and those who do not have Johnny as their doctor (such as

those who have Sally or Jane). There are 2k-1-1 candidate splits when a predictor has k

categories. Finally, once a predictor is used, predictors can be used again to define the

best split in further regions.

Model evaluation. After growing the classification tree in a training dataset, the

tree performance can be evaluated using a holdout dataset. The splits determined by the

CART algorithm during the training phase can be used to predict the outcome in the

holdout dataset. Misclassification rate and variance explained can be estimated from the

predicted values and the observed values in the holdout dataset. It is expected that the

misclassification rate and variance explained in the holdout dataset to be different than

those computed in the training dataset, so the difference in both values can conceptualize

the robustness of the model on different datasets.

CART criticisms and pruning. CART is considered a greedy algorithm because

once it decides on a predictor to split on, that initial decision is not reconsidered in further

regions even though a different predictor could have been a better predictor further down

the model. CART is single-minded because it only looks for regions where the outcome

is most homogeneous. As a result, CART could overfit the training dataset and provide

an unstable solution that may not generalize to other datasets. Therefore, the final CART

model might not be useful in making inferences in other samples. Cost-complexity

pruning is a CART extension to overcome these limitations. Cost-complexity pruning

consists in overgrowing a tree in a training dataset (growing a deep tree that learns all of

the idiosyncrasies of the dataset) and then pruning back branches of the tree with the goal

of reducing prediction error in the holdout dataset. Cost-complexity pruning introduces a

25

penalty parameter that controls the trade-off between tree size and outcome prediction

(James et al., 2013). The size of the tree is determined by k-fold cross-validation, where

the number of regions remaining in the final tree is the size that minimizes the prediction

error in the left-out fold during cross-validation. It is hypothesized that the final model

would be more generalizable to other datasets.

1.3.3 Random Forest: A CART Extension

The random forest algorithm is a CART extension that can improve the tree

stability and can help on prediction and generalizability of the result (Breiman, 2001).

Random forest is an ensemble method that grows decision trees using the CART

algorithm (outlined above) in bootstrapped datasets and averages over all of the models

to make a prediction. In this case, the series of trees, or forest, is used for simultaneous

prediction rather than just a single tree. However, when there is a strong predictor of the

outcome, the tree in each bootstrap dataset might look similar. To increase tree diversity

in bootstrapped datasets, a random sample of candidate predictors to split on are chosen

every time the tree makes a split. Random forest provides more stability to the CART

solution because it could overcome multicollinearity problems when two predictors are

strongly correlated, so in random forests both predictors have opportunities to appear in

the trees. Also, more diverse trees are associated with lower prediction error in testing

datasets. An important limitation of random forests is that the interpretability of the single

tree is lost in exchange for prediction. Although variable importance measures have been

used to identify important predictors in random forests, previous work suggests that there

are methods to approximate ensemble results with a single tree (Gibbons et al., 2013).

1.3.4 Summary

26

Overall, the machine learning methods use data-driven approaches to predict a

diagnosis using assessment items. The Lasso logistic regression selects the most

important items to predict the diagnosis, weights the items by the Lasso coefficients, and

yields a predicted probability of diagnosis. On the other hand, CART can grow a tree

where items are predictors of the classification (Gibbons, Weiss, Frank, & Kupfer, 2016;

McArdle, 2013; Yan, Lewis, & Stocking & 2004), so items could be administered

adaptively depending on the participant’s response. Finally, random forest uses a similar

approach to CART to build the models, but the solution is more stable and variable

importance measures are used to understand which are the most important predictors of

the diagnosis. Previous research on the performance of machine learning algorithms to

predict diagnoses (Lu and Petkova, 2014) suggests that Lasso logistic regression might be

a better classifier than CART and random forest, even though Lasso dropped a lot of

items from the model (set many items to zero). The biggest difference between this

simulation study and Lu and Petkova’s work is that the data-generating model in this

simulation mimics a psychometric assessment with an underlying latent variable, while

their simulation used ten continuous predictors as the assessment.

1.4 Simulation Study

The goal of the present study is to compare the performance of psychometric and

machine learning algorithms for diagnostic assessment. Psychometric methods for

diagnostic assessment focus on improving the measurement precision of an assessment,

and then determine a diagnosis. Strengths of the psychometric methods are that they may

produce reliable and interpretable scores, along with providing a theoretical framework of

inference for how the diagnosis is related to the construct measured by the assessment.

27

Potential problems with the psychometric methods might be the sensitivity to violations

of assumptions, influence of sample size, and that prediction of the diagnosis is not direct

but a by-product of the measurement process (Gibbons et al., 2013). On the other hand,

the machine learning methods for diagnostic assessment build a model to predict the

probability of having the diagnosis. A strength of the machine learning assessments is

that items are selected into the model based on their relation to the diagnosis, so

prediction is done directly. Potential problems with the machine learning methods are

that they assume items are perfect predictors, when in fact they could suffer from

measurement error and bias. Also, the machine learning methods do not provide a

framework for inference in how or why the items are related to the diagnosis, so even if

the outcome changes to a closely-related construct, the model would not be optimal. In

machine learning terminology, the machine learning methods are considered supervised

learning methods because the predictors are trained to predict an outside outcome (the

diagnosis). For each predictor observation, there is an observed output to predict (James

et al., 2013). On the other hand, early stages of psychometric methods are based on

unsupervised learning (dimensionality assessment) because exploratory factor analysis

builds the model based on item similarities without referencing an outside criterion, so

interpretations of the clusters are theoretical.

In this study, it is hypothesized that classification accuracy would increase as

sample size, number of item categories, number of items, prevalence of the diagnosis,

and the diagnosis-test correlation increase. Furthermore, it is hypothesized that, on

average, machine learning methods would have higher classification accuracy than

psychometric methods in conditions where there are violations of psychometric

28

assumptions (such as local dependence) and small sample sizes because the theta scores

may not be estimated accurately. However, it is hypothesized that psychometric methods

would have higher classification accuracy than machine learning methods when the

prevalence is low. That is, psychometric methods balance sensitivity and specificity of

the diagnosis, while machine learning methods may just predict that the lowest occurring

class never happens. Within the machine learning methods, it is hypothesized that the

random forest algorithm may be the best performing classifier because of the diversity in

the ensemble used to make the classification decision. Finally, it is also hypothesized

that, as more items are chosen by the machine learning algorithm, the items could recover

the true IRT θ score. In this simulation, it is assumed that the predictors in the machine

learning algorithms have been theoretically vetted as indicators of a latent construct.

2. Method

This section describes the data-generation, simulation factors, models of analysis,

and general procedures of this study. Appendix A also shows a simulation flowchart used

to generate data. Data were generated, analyzed, and summarized in the R (base)

statistical environment, with code presented in Appendices D and E.

2.1 Data-Generation

Data-generating theta and true diagnosis. For each participant, data were

generated by drawing two positively correlated variables, θAssmt and θdiag, from a bivariate

normal distribution. In this case, the variable θdiag was the score on a construct, which was

dichotomized to yield the participant’s true diagnosis. Thresholds for θdiag were derived

from the quantiles of the normal distribution to yield a specific prevalence of diagnosed

participants. Participants with θdiag below the threshold did not have the diagnosis

29

(diagnosis = 0), and participants with θdiag above the threshold had the diagnosis

(diagnosis = 1). The variable θAssmt represents the participant’s data-generating IRT score

on the assessment, which in turn was used to simulate the item responses.

Item-level responses. IRT item parameters were sampled at each replication.

Binary items only have one threshold, so the b parameters were simulated from the

distribution N(0, 1), and the a parameters were simulated from the distribution N(1.7, .3).

Polytomous items have k-1 thresholds for items with k categories, so the first b parameter

was simulated from the distribution N(-.6, 1), and the remaining b parameters were

sequentially higher than the previous one by adding a random number from the

distribution U(.5, .9). To prevent collapsibility problems, items included in this study had

each of their categories endorsed at least five times. The a parameters for polytomous

items were also simulated from the distribution N(1.7, .3). With θAssmt and the item

parameters, the IRT models in either Eq. 2.1 or Eq. 2.2 were used to obtain the expected

probability of endorsing each category for each item. Finally, the item response was

obtained by comparing the expected probability of the model to a random proportion

from the distribution U(0,1). These item responses were then used to predict the

diagnosis.

Simulation factors. There were six factors varied in this simulation: training

sample size (N=250, 500, 1000), number of items (10, 30), number of item categories (2,

5), diagnosis-test correlation (.30, .50, .70), prevalence of the diagnosis (.05, .10, .20),

and violation of the local independence assumption of IRT (no violation or for 5 items in

30% of the sample). Specifically, this simulation examined surface local dependence,

where participants might respond to a set of items identically because of similar content

30

or location (Edwards, Houts, & Cai, in press). In other words, the IRT model does not

determine the responses to the item, but the item response is conditional on the response

to another item. In the condition of LD=.30, data were generated so that 30% of the

participants had identical responses to the first five items, regardless of the expected

response to the item given by the θAssmt value.

Overall, there were 216 conditions, with 500 replications per condition.

Conditions analyzed had at least 50% of the models across replications converge, or at

least 50% of the models across replications assigned any cases to the minority class. The

simulation outcomes for study 1 (classification rate, sensitivity, and specificity) and for

study 2 (person parameter recovery) were evaluated in a testing sample of N=5,000. It is

important to note that the total number of cases simulated per replication were the sum of

the testing and training sample sizes. Simulation outcomes were analyzed using linear

regression and exploratory methods, such as binary recursive partitioning and the random

forest algorithm (RFA), to identify which simulation factors were important predictors of

the simulation outcomes. For clarification, binary recursive partitioning and RFA for

simulation outcomes should not be confused with conditions using CART and random

forest to assign cases to diagnoses. Assumptions of linear regression were checked by

looking at Cook’s distance for influential points; q-q plots for normality; residual plots

and histograms for functional form; and plots of standardized residuals as a function

outcome value for the assumption of homogeneity of variance. Preliminary checks for

each analysis suggest that the assumptions of regression were largely met.

2.2 Data Analysis

31

Models of analysis. All of the models used the same training sample for each

replication. For the psychometric models, the training sample was used for item

calibration and to determine the cut score. For the machine learning models, the training

sample was used to develop the predictive models. Most machine learning models were

developed using 10-fold cross-validation, except in conditions with either a small sample

size (N=250) or low prevalence (5%) where both 5-fold cross-validation and stratified

sampling were used to guarantee that every fold had both diagnosed and nondiagnosed

cases. As previously mentioned, the machine learning models could predict a diagnosis

using either a Bayes classifier or a ROC classifier. The Bayes classifier assigns cases to

the most probably class. On the other hand, the ROC classifier estimates a ROC curve

from the predicted probabilities of diagnosis and then determines a probability threshold

to balance sensitivity and specificity with the indices discussed in section 1.2.2. Finally,

classification accuracy and parameter recovery for both psychometric and machine

learning models were evaluated with the testing sample of N=5,000. It was assumed that

true positives and true negatives for the diagnoses were equally important, so they were

equally weighted.

1. Data-generating theta: Use the data-generating theta (from the data-generation

step) in a ROC analysis (using the pROC R package; Robin et al., 2011) to

determine cut scores to predict the true diagnosis either to minimize the closest-

to-(0,1) criterion (Eq. 2.10), maximize the Youden index (Eq. 2.11), or maximize

the concordance probability (Eq. 2.12).

a. Predict the class based on data-generating theta in the testing sample from

the cut scores determined from the training sample.

32

2. Raw summed score: Estimate a total score by summing all of the items and use

in a ROC analysis (in the pROC R package) to determine cut scores either to

minimize the closest-to-(0,1) criterion (Eq. 2.10), maximize the Youden index

(Eq. 2.11), or maximize the concordance probability (Eq. 2.12).

a. Predict the class based on the raw summed score in the testing sample

from the cut scores determined from the training sample.

3. Estimated theta: Fit a unidimensional IRT model to calibrate the items, and then

estimate an IRT EAP[𝜃] score (using MML-EM in the mirt R package;

Chalmers, 2012). The estimated theta is then used in a ROC analysis (in the pROC

R package) to determine cut scores either to minimize the closest-to-(0,1)

criterion (Eq. 2.10), maximize the Youden index (Eq. 2.11), or maximize the

concordance probability (Eq. 2.12).

a. Estimate theta scores for participants in the testing sample using the item

parameters from the training sample, and then predict the class using the

estimated theta cut score determined from the training sample.

4. Logistic regression. Predict the probability of diagnosis in the training sample

from item responses using logistic regression (Eq. 3.5, using the glm R package;

R Core Team, 2017).

a. Bayes classifier: Predict the probability of diagnosis in the testing dataset

using the logistic regression model from the training sample and assign the

class with probability > .5.

b. ROC classifier: Estimate an ROC curve from the predicted probabilities in

the training sample and determine a probability threshold using the ROC

33

indices described above. Predict probability of diagnosis in the testing

dataset using the logistic regression model from the training sample and

assign the class depending on the probability threshold.

5. Lasso logistic regression: Predict the probability of diagnosis in a training

sample from item responses using a logistic regression, regularizing the

parameters with the L1 norm (Eq. 3.6, using the glmnet R package; Friedman,

Hastie, & Tibshirani, 2010). Use cross-validation to determine the penalty

parameter that minimizes prediction error in the left-out fold. To generalize to

other samples, a penalty parameter one-standard-error away from the one that

minimized the prediction error was chosen to regularize the parameters.


using the Lasso logistic regression model from the training sample and

assign the class with probability > .5.




dataset using the Lasso logistic regression model from the training sample

and assign the class depending on the probability threshold.

6. Relaxed Lasso logistic regression. Predict the probability of diagnosis using the

Lasso from Model 5 and save the predictors remaining in the model. Re-run the

analysis using an unstructured logistic regression model (Model 4) and estimate

the predicted probability of diagnosis.

34


using the relaxed Lasso logistic regression model from the training sample

and assign the class with probability > .5.




sample using the relaxed Lasso logistic regression model from the training

sample and assign the class depending on the probability threshold.

7. Classification and regression trees: Predict the probability of diagnosis in the

training sample from the item responses using a classification tree (tree R

package; Ripley, 2016). Overgrow the tree in a training sample (deviance of

.0001) and then prune back using cost-complexity pruning. Determine the size of

the tree by cross-validation to find the tree size that minimizes misclassification in

the left-out fold.

a. Bayes classifier: Predict the diagnosis in a testing sample using the pruned

tree from the training sample and assign the class with probability > .5.

b. CART with a ROC classifier was not carried out because CART yields the

same probability to each case in the node, limiting the number of possible

probability thresholds during the ROC analysis.

8. Random Forest: Predict the probability of diagnosis in the training sample from

the item responses using the average prediction of 500 classification trees grown

in 500 bootstrapped datasets from the training sample (using the

randomForest R package; Liam & Wiener, 2002). However, at each node a

35

random subset of the items (p items) were candidate predictors to split on.

Random sampling of items increases diversity of the trees, thus improving

prediction.

a. Bayes classifier: Predict the diagnosis in a testing sample using the

random forest model from the training sample and assign the class with

probability > .5.




sample using the random forest model from the training sample and assign

the class depending on the probability threshold.

Overall, there are eight different models (three psychometric models and five machine

learning models), and four out of the five machine learning models predicted the

diagnosis with two different classifiers. In total there were 12 models of analysis in this

study. The performance of each model is compared relative to the classification accuracy

of data-generating theta (Model 1).

Study 1: Classification accuracy. The goal of study 1 is to investigate the

classification accuracy across the psychometric and machine learning approaches. The

main outcomes of the simulation were classification rates, sensitivity, and specificity

(outlined in section 1.1.1) per model, and also differences in classification accuracy

across models. The outcomes were analyzed using the binary recursive partitioning

approach outlined in Gonzalez, O’Rourke, Wurpts, and Grimm (2017), and also linear

regression, using the six simulation factors as predictors.

36

Study 2: Recovery of the person parameter. One of the properties of Lasso

logistic regression and the CART algorithm is that they select the most important

predictors of the outcome. The goal of study 2 is to investigate if the items selected by

these two machine learning algorithms could recover the data-generating theta. In theory,

the person parameter should be recovered because of the linking property of IRT scores.

However, it is of interest to investigate if there are ceiling or floor effects when an IRT

score is estimated from the remaining set of items. In this case, the unique items chosen

by the machine learning algorithms were scored using the estimated item parameters

from Model 3, and the scores were compared to the data-generating theta using mean

squared error (MSE) and the correlation between estimated and data-generated theta.

Both MSE and the correlation between estimated and data-generating theta were

analyzed using the binary recursive partitioning approach outlined in Gonzalez et al.,

(2017), and using linear regression with the six simulation factors as predictors.

3. Results

There were a lot of results in this study, so this section integrates results across

models and classification indices, along with important findings. A comprehensive write-

up, details, and tables on each of these analyses are presented in Appendix F. The

summary of classification indices in Table 1, along with partial-η2 and variable

importance measures, were used to determine what results to present and which effects to

probe. This section starts by providing results on the estimation and classification

accuracy of the psychometric models. After that, the estimation of the machine learning

models with Bayes and ROC classifiers are described. Then, the classification accuracy

within and across machine learning models with ROC classifier is discussed. Next, the

37

machine learning models with ROC classifier are compared to the classification accuracy

of data-generating theta. Finally, recovery of the person parameter by CART and Lasso

algorithms is discussed.

3.1. Estimation of the Psychometric Models

IRT parameter estimation. Convergence and parameter recovery information

suggest that there were no apparent estimation problems. There were 129 models (out of

108,000 models; 216 conditions, 500 replications per condition) that did not converge,

mainly from conditions with either binary items, ten items, or a sample size of 250.

Parameter recovery. For theta (person parameter) recovery, MSE decreased as

the number of categories increased. For conditions with five-category items, MSE

decreased as sample size increased. The correlation between the estimated theta and the

data-generating theta increased as the number of item categories and the number of items

increased. For item parameters, the MSE decreased and variance explained increased as

sample size and number of items increased and local dependence decreased.

3.2. Classification Accuracy of the Psychometric Models

Effect of ROC index. For the psychometric models, the cut scores in the ROC

analysis were determined by the Youden index, closest-to-(0,1) criterion, and the

concordance probability. However, the performance of these three indices was not

practically different from each other as a function of simulation factors (R2=.002 - .009).

Therefore, the results for the psychometric models are presented based on the Youden

index.

Comparing classification accuracy across psychometric models. Results

suggest that there were no practical differences in classification accuracy between using

38

data-generating theta and the estimated theta (classification rate R2 = .003; sensitivity R2

= .001; and specificity R2 = .002); between using data-generating theta and the raw

summed scores (classification rate R2 = .004; sensitivity R2 = .001; and specificity R2 =

.003); and between using the estimated theta and the raw summed scores (classification

rate R2 = .001; sensitivity R2 = .001; and specificity R2 = .001). It is hypothesized that

these models perform similarly because of the high reliability of the raw summed score

and the recovery of the IRT person parameter. Therefore, results are presented only for

data-generating theta and estimated theta.

Classification accuracy with data-generating theta. In this study, classification

rates ranged from .60 to .80; sensitivity ranged from .59 to .83; and specificity ranged

between .59 to .80. Classification accuracy increased as the diagnosis-test correlation

increased.

Linear Regression. In a regression predicting classification rate of the data-

generating theta from simulation factors, the variance explained was R2=.387.

Classification rates increased as the diagnosis-test correlation increased (b=.177, s.e.

=.005, t=34.429, p<.001, partial-η2=.375) and as the prevalence decreased (b=-.024,

s.e.=.005, p=-4.670, partial-η2=.029). In a regression predicting sensitivity from the

simulation factors, the variance explained was R2=.317. Sensitivity increased as the

diagnosis-test correlation increased (b=.185, s.e. =.007, t=26.710, p<.001, partial-

η2=.312). In a regression predicting specificity from the simulation factors, the variance

explained was R2=. 277. Specificity increased as the diagnosis-test correlation increased

(b=.176, s.e. =.007, t=27.194, p<.001, partial-η2=.264) and as the prevalence decreased

(b=-.025, s.e.=.006, t=-3.860, p<.001, partial-η2=.022).

39

Binary recursive partitioning. For the prediction of classification accuracy of the

data-generating theta in the training sample, binary recursive partitioning only made

splits based on the diagnosis-test correlation. In an RFA model grown in a training

sample to predict classification accuracy of the data-generating theta, the most important

variables were the diagnosis-test correlation and prevalence. Using the RFA model to

predict classification rate in the testing sample (a holdout sample), the MSE was .007,

and the variance explained was .385. Also, the MSE in the prediction of sensitivity was

.013, and the variance explained was .315. Finally, MSE in the prediction of specificity

was .011 and the variance explained was .274.

Classification accuracy with estimated theta. In this study, classification rates

ranged from .58 to .80; sensitivity ranged from .58 to .82; and specificity ranged for .59

to .79. Classification accuracy seemed to increase as the diagnosis-test correlation

increased.

Linear Regression. In a regression predicting classification rate of the estimated

theta from the simulation factors, the variance explained by the predictors was R2=.337.



s.e.=.005, t=-5.887, p<.001, partial-η2=.021). In a regression predicting sensitivity from

the simulation factors, the variance explained by the predictors was R2=.280. Sensitivity

increased as the diagnosis-test correlation increased (b=.175, s.e. =.007, t=24.010,

p<.001, partial-η2=.272). In a regression predicting specificity from the simulation

factors, the variance explained by the predictors was R2=.236. Specificity increased as the


40

η2=.220) and as the prevalence decreased (b=-.039, s.e.=.007, t=-5.793, p<.001, partial-

η2=.017).

Binary recursive partitioning. For the prediction of classification accuracy of the

estimated theta in the training sample, binary recursive partitioning only made splits

based on the diagnosis-test correlation. In an RFA model grown in a training sample to

predict the classification accuracy of the estimated theta, the most important variables

were diagnosis-test correlation and prevalence. Using the RFA model to predict

classification rate in the testing sample (a holdout sample), the MSE was .007, and the

variance explained was .334. Also, MSE in the prediction of sensitivity was .014, and the

variance explained was .281. Finally, the MSE in the prediction of specificity was .012,

and the variance explained was .233.

Brief Summary. Across psychometric models, the best predictors of

classification accuracy were the diagnosis-test correlation and prevalence. Specifically,

classification accuracy increased as the diagnosis-test correlation increased and

prevalence decreased. More details can be found in Appendix F, section 1 and 2.

3.3. Estimation of the Machine Learning Models

Model building. All of the machine learning models with a Bayes classifier only

assigned cases to the majority class, except in conditions with high diagnosis-test

correlation, high prevalence, high sample size, and 30 five-category items. As shown in

Table 1, these models also had very low sensitivity and inflated specificity compared to

the sensitivity and specificity of data-generating theta. Therefore, machine learning

models with a Bayes classifier are only discussed in appendix F, section 3 and 4.

41

On the other hand, both random forest and logistic regression using ROC

classifiers had a high likelihood of assigning cases to the minority class across all

conditions. Lasso logistic regression and relaxed Lasso logistic regression only had

greater than a 50% chance of assigning cases to the minority class in conditions with a

diagnosis-test correlation of .70. Therefore, only these four models are discussed in the

next part of this section.

3.4. Classification Accuracy of Machine Learning Models with ROC Classifiers.

For the machine learning methods examined, Figure 1 shows the variance

explained and the partial eta-squared effect sizes for the predictors of classification

accuracy across methods, and Figure 2 shows the RAF variable importance measures for

each of the predictors. Further details are provided in appendix F, section 5.

Effect of ROC index. For random forest, there were medium differences in

classification rate (R2=.149-.150), sensitivity (R2 = .122-.124), and specificity (R2 =.134-

.135) as a function of simulation factors, across the three ROC indices. For logistic

regression, there were small differences in classification rate (R2=.017-.022) and

specificity (R2 =.014-.016), and small to medium differences in sensitivity (R2 = .016-

.137) as a function of simulation factors, across the three ROC indices. For Lasso logistic

regression, there were small differences in classification rate (R2=.008-.017), sensitivity

(R2 = .003-.006), and specificity (R2 =.004-.005) as a function of simulation factors,

across the three ROC indices. Finally, for the relaxed Lasso logistic regression, there

were small differences in classification rate (R2=.006-.014), sensitivity (R2 = .003-.005),

and specificity (R2 =.003-.005) as a function of simulation factors, across the three ROC

indices. In these analyses, the Youden index had the highest sensitivity across the vast

42

majority of conditions, so results presented are based on the Youden index to increase

sensitivity.

Logistic regression classification accuracy with ROC classifier. According to

partial-η2 effect sizes (Figure 1) and RAF variable importance measures (Figure 2), by far

the most important predictor of classification accuracy was diagnosis-test correlation. On

average, classification rate, sensitivity, and specificity increased as the diagnosis-test

correlation increased. Also, classification rates increased as sample size increased. For

sensitivity, there were significant three-way interactions between sample size, prevalence,

and diagnosis-test correlation, and between sample size, prevalence, and number of

items. However, the importance of other predictors beyond diagnosis-test correlation was

not reflected in the variable importance measures. Finally, specificity decreased as

prevalence increased, although the importance of prevalence is not reflected in the

variable importance measures. According to Cohen’s f, variance explained by the

simulation factors was greater than a large effect size.

Lasso and relaxed Lasso classification accuracy with ROC classifier. There

were no practical differences in classification accuracy between Lasso logistic regression

and relaxed Lasso logistic regression as a function of simulation factors (classification

rate R2=.004; sensitivity R2=.004; specificity R2=.004). Therefore, only estimates from

the relaxed Lasso are discussed in the rest of the document. Predictors of classification

accuracy were the main effects of item categories, sample size, number of items, and

prevalence. According to Cohen’s f, variance explained by the simulation factors was

between a small and a medium effect size. On average, classification accuracy increased

as sample size, number of items, and number of item categories increased, and

43

classification accuracy decreased as prevalence increased. As mentioned before, these

results apply only to conditions where the diagnosis-test correlation is .70.

Random Forest classification accuracy with ROC classifier. According to

partial-η2 effect sizes and RAF variable importance measures, the most important

predictor of classification accuracy of the random forest model is diagnosis-test

correlation. Additionally, important predictors of sensitivity were number of items,

prevalence, and number of item categories. On average, classification rates and

specificity increased as the diagnosis-test correlation increased. On average, sensitivity

increased as the diagnosis-test correlation, number of item categories, prevalence, and

number of items increased, and sensitivity decreased as sample size increased. However,

as the number of items and item categories increased, differences in sensitivity across

sample size and prevalence decreased.

Comparing accuracy across machine learning models with ROC classifier.

There were differences between relaxed Lasso logistic regression and logistic regression

on classification rates (R2=.120), sensitivity (R2=.393), and specificity (R2=.111).

Differences in classification accuracy of relaxed Lasso logistic regression and logistic

regression varied as a function of sample size, number of items, and prevalence.

Conditions with 30 items favored relaxed Lasso logistic regression, and conditions with

ten items favored logistic regression. Across number of items, differences in

classification accuracy decreased as prevalence and sample size increased.

Also, there were differences between relaxed Lasso logistic regression and

random forest on classification rates (R2=.120), sensitivity (R2=.393), and specificity

(R2=.111). According to variable importance measures, the differences in classification

44

accuracy across relaxed Lasso logistic regression and random forest varied as a function

of number of item categories, prevalence and number of items. Across prevalence,

conditions with 30 five-category items had classification rates and specificity that favored

random forest, along with conditions with ten binary items with 5% prevalence.

Conditions with 30 binary items and most conditions with ten binary items had

classification rates and specificity that favored relaxed Lasso logistic regression.

Differences in classification rates decreased as prevalence and number of items increased,

and as the number of item categories decreased. For sensitivity, most conditions favored

relaxed Lasso logistic regression, especially conditions with ten binary items and low

prevalence.

Finally, there were differences in classification rate (R2=.184), sensitivity

(R2=.351), and specificity (R2=.170) between logistic regression and random forest. For

classification rates, most conditions with ten five-category items favored logistic

regression, while conditions with 30 five-category items favored random forest. For five-

category items, differences between methods increased as sample size increased, and

decreased as prevalence increased. Conditions with ten binary items and low prevalence

favored random forest, while conditions with 30 binary items and sample size greater

than 500 favored logistic regression. For sensitivity, conditions with 30 items or

conditions with ten five-category items favored random forest. Conditions with ten binary

items favored logistic regression, where conditions with high prevalence and high sample

size had larger sensitivity for logistic regression. For specificity, most conditions favored

logistic regression, except for conditions with ten binary items and low prevalence.

3.5. Comparing Psychometric and Machine Learning Models

45

Using a one-number summary for classification accuracy, the four machine

learning models were able to recover true prevalence from estimates of apparent

prevalence, sensitivity, and specificity. Therefore, we compared each classification

accuracy index separately. More information can be found in appendix F, section 6.

Machine learning models vs data-generating theta. The vast majority of

conditions had higher classification accuracy for the data-generating model than for the

logistic regression model. Differences in classification between logistic regression and

data-generating theta ranged between -.09 to .01; differences in sensitivity ranged from -

.30 to .02; and differences in specificity ranged from -.11 to .03. Although there were

significant differences in classification accuracy across simulation factors (classification

rate, R2=.067; sensitivity, R2=.037; specificity, R2=.028), none of the predictors had a

partial-η2 > .01.

Also, the vast majority of conditions had a higher classification rate for the data-

generating model than for the relaxed Lasso logistic regression model. Differences in

classification rate between relaxed Lasso and data-generating theta ranged from -.10 to

.01; sensitivity differences ranged from -.04 to .01; and specificity differences ranged

from -.10 to -.02. Differences in classification rate (R2=.082) and specificity (R2=.028)

decreased as sample size and prevalence increased. Classification rate differences also

decreased as number of items increased. None of the predictors of the difference in

sensitivity (R2=.055) had a partial η2 > .01.

Finally, the vast majority of conditions had higher classification rates for the data-

generating model than for the random forest model, except in conditions with ten binary

items, prevalence of .05, and sample size of 500 or 1,000. Differences in classification

46

rate between the random forest model and data-generating theta ranged between -.09 to

.01; differences in sensitivity ranged from -.30 to .02; and differences in specificity

ranged from -.11 to .03. On average, differences in classification rate (R2=.156)

decreased as the number of items, prevalence, and number of item categories decreased.

For both binary and five-category items, there were greater differences in classification

rates across method in conditions with 10 items than in conditions with 30 items.

Differences in sensitivity (R2=.270) increased as sample size increased, and differences in

sensitivity decreased as prevalence, number of items, and number of item categories

increased. For conditions with 10 items, the difference in sensitivity increased as sample

size increased, and the difference in sensitivity decreased as prevalence and number of

categories increased. In most conditions with 30 items or conditions with five-category

items, random forest had higher sensitivity than data-generating theta. On average,

differences in specificity (R2=.140) decreased as prevalence decreased. Similar to

classification rates, random forest had higher specificity than data-generating theta in

conditions 10 binary items, prevalence of .05, sample size of 500 or 1,000. For conditions

with 30 items, specificity differences decreased as the number of item categories

increased. For conditions with binary items, differences in specificity decreased as the

number of items increased.

Machine learning models vs estimated theta. The vast majority of conditions

had higher classification accuracy for estimated theta than the logistic regression model.

Differences in classification between logistic regression and estimated theta ranged

between -.10 to .02; differences in sensitivity ranged from -.28 to .03; and differences in

specificity ranged from -.10 to .03. Although there were significant differences in

47

classification rate (R2=.040) and specificity (R2=.028) across simulation factors, none of

the predictors had a partial η2 > .01. For sensitivity (R2=.064), differences in sensitivity

increased as the number of items increased in conditions with low prevalence and small

sample size.

Also, the vast majority of conditions had higher classification rate for estimated

theta than for the relaxed Lasso logistic regression model. Differences in classification

rate between relaxed Lasso and estimated theta ranged from -.06 to .00; sensitivity

differences ranged from -.01 to .03; and specificity differences ranged from -.06 to .00.

Differences in classification rate (R2=.054) and specificity (R2=.037) decreased as sample

size and prevalence increased. Classification rate differences also decreased as number of

items increased. None of the predictors of difference in sensitivity (R2=.010) had a partial

η2 > .01.

Similar to the results of data-generated theta, the vast majority of conditions had

higher classification rates for the estimated theta model than the random forest model,

except in conditions with ten binary items, prevalence of .05, and sample size of 500 or

1,000. Differences in classification rate between the random forest model and estimated

theta ranged between -.12 to .20; differences in sensitivity ranged from -.42 to .05; and

differences in specificity ranged from -.13 to .24. On average, differences in classification

rate (R2=.167) decreased as the diagnosis-test correlation, number of items, prevalence,

and number of item categories increased. For binary items, conditions with low

prevalence and 10 items favored random forest compared to the estimated thetas.

Differences in sensitivity (R2=.256) decreased as prevalence, sample size, number of

items, and number of item categories increased. In conditions with 10 binary items and

48

low prevalence, differences increased as sample size increased. In most conditions with

30 items or conditions with five-category items, random forest had a slightly higher

sensitivity than the estimated thetas. On average, differences in specificity (R2=.151)

decreased as prevalence, number of items, number of item categories, and sample size

increased. Similar to classification rates, random forest had higher specificity than

estimated theta in conditions 10 binary items, prevalence of .05, sample size of 500 or

1,000. For conditions with 30 items, specificity differences decreased as the number of

item categories increased. For conditions with binary items, differences in specificity

decreased as the number of items increased.

3.6. Scoring Machine Learning Items for Person Parameter Recovery

The CART algorithm and Lasso logistic regression select the most important

items in the prediction of the diagnosis. This section investigates the recovery of data-

generating theta when the algorithms selected at least two items, and then the items were

scored using the estimated item parameters from the IRT model in section 3.2. On the

other hand, random forest does not do variable selection, so parameter recovery was

studied by scoring half of the items with the highest variable importance. Results should

be interpreted conditional on how many items the algorithm selected. More information

can be found in appendix F, section 7.

Person Parameter Recovery by CART. Only conditions with prevalence greater

than 5% and a sample size greater than 250 were analyzed because those conditions were

likely to assign cases to the minority class. According to variable importance measures,

important predictors of theta MSE and the correlation between true and estimated theta

were the diagnosis-test correlation and number of items. The CART algorithm would

49

choose almost all of the items in conditions with low diagnosis-test correlation and low

number of items. Therefore, in those conditions theta MSE is close to zero and the

diagnosis-test correlation is close to 1. Conditions with a diagnosis-test correlation of .70

would chose between 2-7 items, and the correlation between data-generating and

estimated theta was between .62-.87.

Person Parameter Recovery by Lasso. In contrast to CART, Lasso selected

more items as the diagnosis-test correlation increased. Only conditions with a diagnosis-

test correlation of .70 were analyzed because those conditions were likely to assign cases

to the minority class. According to variable importance measures, important predictors of

theta MSE and the correlation between data-generating and estimated theta were sample

size, prevalence, number of items, and number of item categories. On average, MSE

decreased and the correlation between data-generating and estimated theta increased as

sample size, prevalence, and number of item categories increased, and as the number of

items decreased. The Lasso chose between 2-7 items in the 10 item condition and

between 7-13 items for the 30 item condition, and the correlation between data-

generating and estimated theta was between .70-.96.

Person Parameter Recovery by Random Forest. According to variable

importance measures, important predictors of theta MSE and the correlation between

data-generating and estimated theta were number of items and number of item categories.

On average, theta MSE decreased and the correlation between data-generating and

estimated theta increased as the number of items and number of categories increased. The

correlation between data-generating and estimated theta ranged between .80 and .91

50

when the 5 (out of 10) most important items were scored, and the correlation ranged

between .86 and .95 when the 15 (out of 30) most important items were scored.

4. Discussion

Diagnostic assessments are important in psychology because they are cheaper and

less invasive than a gold standard that dictates diagnoses. After an assessment has been

administered, it is important to determine if the case should be classified as diagnosed or

not. This simulation compared methods for diagnostic classification using psychometric

and machine learning approaches. Psychometric methods predict diagnoses using either

an estimated IRT score or raw summed score from assessment items. Then, cut scores for

psychometric models are determined by receiver operating characteristic (ROC) curves.

The machine learning methods used in this simulation were classification and regression

trees, random forest, logistic regression, Lasso logistic regression, and relaxed Lasso

logistic regression. The machine learning methods predict the probability of diagnosis

from item responses, so they assign a diagnosis either by looking at the class that is most

probable to reduce prediction error (Bayes classifier), or using ROC curves to find a

probability threshold to assign diagnoses (ROC classifier).

Overall, this study suggests that the important factors for classification accuracy

in psychometric models are the diagnosis-test correlation and prevalence of the diagnosis.

Machine learning models with a Bayes classifier had high specificity and very low

sensitivity across conditions. The low sensitivity across conditions suggests that these

methods should not be used in practice. However, machine learning models with a ROC

classifier had comparable classification accuracy to the psychometric models as the

number of items, number of item categories, and sample size increased. Therefore, results

51

suggest that machine learning approaches could be a viable alternative to psychometric

models for diagnostic assessments. Next, the main conclusions and limitations from this

study are discussed in reference to the study hypotheses.

4.1 Revisiting Study Hypotheses

Hypothesis 1. It was hypothesized that classification accuracy would increase as

sample size, number of items, number of item categories, prevalence of the diagnosis, and

the diagnosis-test correlation increased. Hypothesis 1 is largely supported. The results

suggest that classification accuracy in psychometric models depend on the diagnosis-test

correlation and prevalence, independent of other simulation factors. For machine learning

models with a ROC classifier, classification accuracy in logistic regression mostly

depended on the diagnosis-test correlation. Classification accuracy of the random forest

algorithm increased as the diagnosis-test correlation, number of items, and number of

item categories increased. In other words, classification accuracy improved in the random

forest algorithm when the items provided more candidate splits. Classification accuracy

of relaxed Lasso logistic regression and Lasso logistic regression were not significantly

different, and models had higher than a 50% chance of assigning cases to the minority

class only in conditions with a diagnosis-test correlation of .70. Classification accuracy of

relaxed Lasso and Lasso logistic regression increased as sample size, number of items,

and number of item categories increased, and as prevalence decreased. Diagnosis-test

correlation was an important predictor for both random forest and logistic regression, so

it is not surprising that the simulation factors explain less variance for both of the Lasso

models than for the random forest and logistic regression models. Finally, machine

learning algorithms using the Bayes classifier were more likely to predict the minority

52

class as the simulation factors increased, even though classification accuracy was poor

(discussed in Hypothesis 3).

Hypothesis 2. It was hypothesized that, on average, machine learning methods

would have higher classification accuracy than psychometric methods in conditions

where there are violations of psychometric assumptions (such as local dependence) and

small sample sizes because the IRT θ scores might not be estimated accurately. This

hypothesis could only be partially tested because classification accuracy was not

significantly affected in conditions with violations of local independence. Although local

dependence affected item parameter recovery, local dependence was not a significant

predictor of classification accuracy in any model. A future direction would be to

investigate the effect of larger violations of local dependence or investigate other model

misspecifications of the IRT model (further discussed below). In conditions with small

sample sizes, most machine learning models with the Bayes classifier did not assign cases

to the minority class. On the other hand, there were small sample conditions where

machine learning models with ROC classifiers outperformed data-generating theta in

classification accuracy. These conditions also had prevalence of 5%. For example, there

was a .011 to .012 difference in specificity favoring relaxed Lasso logistic regression in

conditions with 30 five-category items. Also, there were conditions with either 10 binary

items, 5% prevalence, and small sample size or 10 binary items, 5% prevalence, and a

diagnosis-test correlation of .30 and .50 where differences in specificity favored logistic

regression. Finally, conditions with 10 five-category items with low diagnosis-test

correlation had differences in sensitivity that favored the random forest algorithm (largest

differences ranged from .03 to .04). This is also found in conditions with 10 binary items,

53

but only when there is high prevalence. For conditions with 30 items and a diagnosis-test

correlation of .30, all conditions slightly favored random forest.

There are several future directions for this research. First, the differences in

classification accuracy were small. It would be interesting to investigate if the difference

between classification accuracy for each combination of simulation factors is statistically

significant or just chance. Significance tests could either use an analytical standard error

of the difference in classification indices or use bootstrapped confidence intervals for

classification indices to see if they include zero. A larger number of replications per

condition could also lead to greater precision of the difference in classification accuracy.

Moreover, results suggest that further simulations should focus on comparing these

methods with small sample sizes, where machine learning models could have a

hypothetical advantage. However, there is a possibility that the results with small samples

are not reliable because different procedures were taken to get the models to work. As

mentioned in the Methods section, conditions with either a small sample size or small

prevalence used five-fold cross-validation (instead of ten-fold cross-validation) and

stratified sampling to guarantee that each fold had both diagnosed and non-diagnosed

cases. There is a possibility that using fewer folds might reduce the variability in the

classification accuracy results. An alternative approach would be to use leave-one-out

cross-validation for models with a small sample size or low prevalence. Also, there is a

possibility that machine learning algorithms could over-predict the majority class in

conditions with small sample size or low prevalence, thus leading to higher classification

rates and specificity than in the psychometric methods. Therefore, sensitivity might be a

better classification accuracy index to evaluate models with small sample sizes and low

54

prevalence. In this case, random forest had higher sensitivity than the data-generating

model in conditions with more pieces of information (more predictors, and each predictor

with many candidate splits). Overall, it would be interesting for other simulations to

evaluate conditions with small sample sizes and low prevalence.

Hypothesis 3. It is hypothesized that psychometric methods would have higher

classification accuracy than machine learning methods when the prevalence is low. That

is, psychometric methods balance sensitivity and specificity of the diagnosis, while the

machine learning methods might just predict that the lowest occurring class never

happens. Hypothesis 3 is largely supported. For the machine learning algorithms with a

Bayes classifier, specificity was inflated and sensitivity was too low. This is not

surprising– in conditions with low prevalence (5%), the Bayes classifier would predict

that there should not be a diagnosis, and the algorithm would be correct 95% of the time.

Potential improvements to the machine learning models with Bayes classifiers are

discussed in the section below. When the machine learning models used a ROC classifier,

there was a balance between specificity and sensitivity, so classification accuracy was

within the range of data-generating theta (See Table 1 and Hypothesis 2).

Hypothesis 4. Within the machine learning methods, it is hypothesized that the

random forest algorithm might be the best performing classifier because of the diversity

in the ensemble used to make a joint decision. There is evidence to support this

hypothesis. As previously mentioned, Lasso logistic regression and the relaxed Lasso

logistic regression with ROC classifiers were likely to assign cases to the minority class

only when there was a diagnosis-test correlation of .70. On the other hand, the random

forest algorithm with a ROC classifier worked for all of the conditions. Conditions with

55

30 five-category items were likely to favor the random forest algorithm over the logistic

regression methods. In other words, the random forest algorithm performs better when

the predictors have more pieces of information.

Hypothesis 5. It is also hypothesized that, as more items are chosen by the

machine learning algorithms, the items would recover the true IRT θ score. There is

evidence to support this hypothesis – although it is obvious that more items should lead

to acceptable recovery. Perhaps the hypothesis should have been that by scoring the

remaining items chosen by the machine learning algorithms, the true IRT θ score would

be recovered and that there will not be ceiling or floor effects. Perhaps the interesting

result is that, for CART, conditions with low diagnosis-test correlation would select more

items to predict the diagnosis, which increases parameter recovery. As diagnosis-test

correlation increases, fewer items are chosen, but parameter recovery is still acceptable.

For the Lasso, recovery of the person parameter with 2-13 items seemed acceptable.

Results from random forest suggest that using half of the items (either 5 or 15) recovers

theta well.

4.2 Limitations and Future Directions

The section is broken into three parts. The first part covers ways in how this

simulation study could be improved. The second part discusses direct extensions of this

simulation study. Finally, the third part describes external factors that influence the

simulation and potentially influence the classification accuracy of machine learning and

psychometric models.

Improving the simulation. There were several problems with the simulation.

One of the goals of the simulation was to study the influence of violating psychometric

56

model assumptions on classification accuracy, specifically the influence of violating the

local independence assumption. In this study, surface local dependence was introduced

by constraining 30% of the sample to have identical responses to first five items,

regardless of the item parameters and latent variable score. The violation of local

independence affected item parameter recovery during IRT estimation, but it did not

influence classification accuracy across psychometric or machine learning models.

Therefore, a future direction would be to include a stronger manipulation of local

dependence or examine underlying local dependence by introducing unmodeled

dimensions to the item responses. Furthermore, the classification accuracy between the

estimated IRT scores and the summed scores were not significantly different from each

other. In this case, estimated IRT scores and summed scores might have led to similar

classification accuracy because both were highly correlated and the items were highly

reliable. Therefore, a future direction would be to investigate how classification accuracy

changes as a function of summed score reliability. It is important to note that estimated

theta scores should be preferred over summed scores because, in expectation, they should

approach the data-generating theta and could be used for linking item responses when

cases do not take the same items. In the same vein, the distribution of item parameters

mimicked those found in traditional assessments, but this simulation made sure that in

each replication there were at least five responses per category to prevent collapsibility of

item responses. Collapsibility may influence the estimation of the IRT model and the

recovery of the person parameter, so a future direction would be to use a more realistic

approach to simulate item responses. Also, the machine learning algorithms with a Bayes

classifier were not likely to assign cases to the minority class. Potential future directions

57

to improve the results of the Bayes classifier would be to introduce item weights to the

algorithms to differentially weight false positive and false negatives so that the majority

class is not over-predicted. Another potential future direction would be to look at the test

information function from the assessment items to investigate if there is information close

to the cut score. If there is not a lot of precision around the cut score, then there is a

possibility that the machine learning models would not chose items to predict the

diagnosis. Similarly, more reliable predictors, such as a total score or item parcels, could

be included in the machine learning models as additional predictors to increase the

chances of choosing a predictor. In this case, item parcels may be preferred over total

scores to take advantage of the variable selection property of machine learning

algorithms. A potential problem is that CART is biased towards selecting predictors with

a lot of unique values to split on. A potential solution would be to use the conditional

trees algorithm to prevent the CART selection bias (Strobl et al., 2009). Overall,

performance of machine learning algorithms in the presence of both item responses and

item parcels still has to be evaluated.

Direct extensions of the simulation. The simulation in this study could be

extended by either adding other data-generating models or using other algorithms for

diagnostic classification. The data-generating model in this study was a psychometric

model, but the dataset could have been simulated so that it favored the machine learning

approaches. For example, the main predictors of the diagnosis could have been some, but

not all, of the items or a complex interaction among the items. Also, model error could

have been introduced to the data-generating model so that psychometric models were not

substantially favored. Furthermore, other models could have been used to generate the

58

data so that they would not favor either the IRT or machine learning models, such as a

diagnostic classification model (DCM) or a model with causal indicators. Finally,

alternative models could have been used to predict the diagnosis. For example,

computerized adaptive testing and computerized adaptive diagnosis could have been used

to estimate the latent variable score of the participants while also selecting less items than

those found in the assessment. Boosting, a supervised learning algorithm that uses binary

recursive partitioning to fit trees to residual structures, could also have been used to

predict the diagnosis from item responses. K-means clustering, an unsupervised learning

algorithm that uses distance measures to find grouping structures in a dataset, could have

been used to investigate if it could differentiate between the diagnosed and non-

diagnosed groups.

External factors influencing the simulation. There are additional factors in the

simulation that could have influenced the diagnostic classification of machine learning

and psychometric methods. For example, it would be interesting to study the influence of

missing data on diagnostic classification. Psychometric methods could deal with missing

data by using full information maximum likelihood. On the other hand, missing data

handling varies by machine learning algorithm. Also, it would be interesting to study the

influence of class imbalance corrections in classification accuracy across methods. There

are approaches in machine learning to overcome class imbalance problems, such as

oversampling the minority class, undersampling the majority class, or creating synthetic

cases of the minority class using nearest-neighbor algorithms (SMOTE). It is

hypothesized that using some of the approaches to overcome class imbalance should lead

to higher classification accuracy for the machine learning algorithms with a Bayes

59

classifier. Finally, the influence of reducing assessment length on the classification

accuracy of psychometric models should be investigated. In this study, a lot of the

machine learning models chose less items to predict the diagnosis than the items

available, while the psychometric models used all of the items. A psychometric approach

to create static short-forms would be to chose a set of items that maximize item

information in a certain range across the latent variable or to maximize item information

close to a cut score. After items are selected, IRT scores could be estimated and

compared to a cut score to determine diagnosis. As previously mentioned, another

approach to reduce the number of administered items is to use computerized adaptive

testing, where administered items are tailored to the participant depending on their

previous responses.

4.3 Conclusion

The results of this study suggest that there is a possibility that machine learning

algorithms using a ROC classifier could be used for diagnostic assessment, and that they

could provide a viable alternative to the psychometric models. However, it is important to

consider the implications for assessment by taking either a psychometric and a machine

learning approach to diagnostic classification. The psychometric approach focuses on

improving the measurement of the construct in the assessment by introducing a latent

variable. Therefore, the model used for classification could be generalizable due to

theory. However, there might be estimation problems when there are violations of model

assumptions. In this case, the psychometric approach uses all of the items to estimate the

latent variable, and then a cut score is determined to balance sensitivity and specificity.

So, there is an indirect prediction of the diagnosis because the diagnosis is not part of the

60

measurement process. On the other hand, machine learning builds a model to predict the

probability of diagnosis. Therefore, there is a direct prediction of the diagnosis and the

model is outcome-specific. Consequently, it is difficult to use the framework for

inference, but it might be more robust to violations of psychometric assumptions.

Keeping these goals of diagnostic assessment in mind could help researchers decide what

approach to take, and the results of this simulation could suggest under what

circumstances each model could perform well. Finally, perhaps the most important

finding of this study is that there is a lot of overlap between psychometrics and machine

learning, so understanding advantages and disadvantages from each of the models could

help each of these fields learn from the other.

61

References

Achenbach, T., & Rescorla, L. (2013). Achenbach system of empirically based

assessment. In Encyclopedia of autism spectrum disorders (pp. 31-39). Springer New

York.

Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.

Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and

regression trees. Wadsworth and Brooks.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s

ability. In F. M. Lord and M. R. Novick, Statistical theories of mental test scores.

Reading, MA: Addison-Wesley.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item

parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.

Chalmers, R. P., (2012). mirt: A Multidimensional Item Response Theory Package for

the R Environment. Journal of Statistical Software, 48(6), 1-29.

de Ayala, R. J. (2009). The theory and practice of item response theory. New York:

Guilford Press.

Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to

questionnaire development, evaluation, and refinement. Quality of Life Research, 16, 5-

18.

Edwards, M. C., Houts, C. R., & Cai, L. (In Press). A Diagnostic Procedure to Detect

Departures from Local Independence in Item Response Theory Models. Psychological

Methods.

Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic

Press.

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized

Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.

Gibbons, R. D., Hooker, G., Finkelman, M. D., Weiss, D. J., Pilkonis, P. A., Frank, E.,

Moore, T., & Kupfer, D. J. (2013). The CAD-MDD: a computerized adaptive diagnostic

screening tool for depression. The Journal of Clinical Psychiatry, 74, 669-674.

Gibbons, R. D., Weiss, D. J., Frank, E., & Kupfer, D. (2016). Computerized adaptive

diagnosis and testing of mental health disorders. Annual review of clinical

psychology, 12, 83-104.

62

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical

learning. New York: Springer.

Levy, R., & Mislevy, R. J. (2016). Bayesian psychometric modeling. CRC Press.

Lewinsohn, P. M., Seeley, J. R., Roberts, R. E., & Allen, N. B. (1997). Center for

Epidemiologic Studies Depression Scale (CES-D) as a screening instrument for

depression among community-residing older adults. Psychology and Aging, 12, 277-287.

Liaw, A. & Wiener, M. (2002). Classification and Regression by randomForest. R News,

2, 18-22.

Liu, X. (2012). Classification accuracy and cut point selection. Statistics in

medicine, 31(23), 2676-2686.

Lu, F., & Petkova, E. (2014). A comparative study of variable selection methods in the

context of developing psychiatric screening instruments. Statistics in Medicine, 33(3),

401-421.

McArdle, J. (2013b). Adaptive testing of the Number Series Test using standard

approaches and a new decision tree analysis approach. In J. McArdle & G. Ritschard

(Eds.), Contemporary issues in exploratory data mining in the behavioral sciences (pp.

312-344). New York: Routledge.

McFall, R. M., & Treat, T. A. (1999). Quantifying the information value of clinical

assessments with signal detection theory. Annual Review of Psychology, 50(1), 215-241.

Meinshausen, N. (2007). Relaxed lasso. Computational Statistics & Data Analysis, 52,

374–393. doi:10.1016/j.csda.2006.12.019

Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.

Ripley, B. (2016). tree: Classification and Regression Trees. R package version 1.0-37.

https://CRAN.R-project.org/package=tree

Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J-C., & Müller, M.

(2011). pROC: an open-source package for R and S+ to analyze and compare ROC

curves. BMC Bioinformatics, 12, p. 77.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded

scores. Psychometrika Monograph No. 17, 34 (4, Pt. 2)

Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning:

Rationale, application, and characteristics of classification and regression trees, bagging,

and random forests. Psychological Methods, 14, 323-348.

https://cran.r-project.org/package=tree

63

Thissen D., Orlando, M. (2001) Item Response Theory for items scored in two categories.

In D. Thissen and H. Wainer, (Eds), Test scoring (pp. 73-140). Mahwah: Lawrence

Earlbaum Associates, Publishers.

Thissen, D., & Steinberg, L. (2009). Item response theory. In Millsap, R. E., & Maydeu-

Olivares, A. (Eds.). The Sage handbook of quantitative methods in psychology (pp. 148-

177).

Thissen, D., & Wainer, H. (Eds.). (2001). Test scoring. Mahwah: Lawrence Erlbaum

Associates, Publishers.

Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the

Royal Statistical Society. Series B (Methodological), 267-288.

van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation

adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive

testing (pp. 3–30). New York: Springer.

Yan, D., Lewis, C., & Stocking, M. (2004). Adaptive testing with regression trees in the

presence of multidimensionality. Journal of Educational and Behavioral Statistics, 29,

293-316.

Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.)

Educational Measurement, 4th Edn., (pp. 111-153). Westport, CN: Greenwood

Publishing Group.

Youngstrom, E. A. (2013). A primer on receiver operating characteristic analysis and

diagnostic efficiency statistics for pediatric psychology: we are ready to ROC. Journal of

Pediatric Psychology, 39, 204-221.

Youngstrom, E. A., Frazier, T. W., Demeter, C., Calabrese, J. R., & Findling, R. L.

(2008). Developing a ten item mania scale from the Parent General Behavior Inventory

for children and adolescents. The Journal of Clinical Psychiatry, 69, 831-839.

Zhou, X. H., McClish, D. K., & Obuchowski, N. A. (2011). Statistical methods in

diagnostic medicine. New York: John Wiley & Sons.

Zou, K. H., O’Malley, A. J., & Mauri, L. (2007). Receiver-operating characteristic

analysis for evaluating diagnostic tests and predictive models. Circulation, 115, 654-657.

Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: a

fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39, 561-577.

64

APPENDIX A

FLOWCHART OF THE DATA-GENERATING PROCEDURE

66

APPENDIX B

TABLES

67

Table 1

Median of classification accuracy indices by model

Classification Rate Sensitivity Specificity

Prevalence .05 .10 .20 .05 .10 .20 .05 .10 .20

Data-generating Theta .73 .71 .68 .74 .72 .71 .73 .71 .69

Estimated Theta .72 .69 .67 .73 .72 .70 .72 .69 .67

Raw Summed Score .71 .69 .67 .74 .72 .70 .71 .68 .67

CART ^ - - .74 - - .26 - - .86

RF Bayes classifier - - .80 - - .14 - - .96

Lasso Logistic

Bayes classifier % - - .82 - - .16 - - .99

Relaxed Lasso Logistic

Bayes classifier # .95 .90 .82 .13 .21 .37 .99 .98 .94

Logistic Regression

Bayes classifier @ .94 .89 .79 .05 .08 .19 .98 .98 .94

RF ROC classifier .67 .65 .64 .71 .71 .70 .67 .64 .64

Lasso Logistic

ROC classifier*** .74 .73 .72 .81 .79 .76 .74 .72 .71

Relaxed Lasso

ROC classifier*** .75 .73 .73 .81 .79 .76 .74 .73 .72

Logistic Regression

ROC classifier .67 .66 .65 .72 .71 .69 .67 .66 .65

Note: (***) Only for conditions with diagnosis-test correlation of .70. For data-generating thetas,

classification rates were [.80, .77, .75], sensitivity were [.83, .81, .78], and specificity were [.80,

.77, .75].

^ is only for conditions greater than N=250 % is for conditions with five-category items, diagnosis-test correlation of .70, and sample size

greater than N=250

# is for conditions with five-category items and diagnosis-test correlation of .70 @ is for conditions with thirty items

- is for conditions not analyzed

68

APPENDIX C

FIGURES

69

Fig

ure

1. V

aria

nce

ex

pla

ined

and

unco

ndit

ional

η2 e

ffec

t si

zes

for

the

pre

dic

tors

of

clas

sifi

cati

on

accu

racy i

n m

achin

e le

arnin

g m

odel

s usi

ng R

OC

cla

ssif

iers

. N

ote

: N

eith

er l

oca

l dep

end

ence

or

four/

five-

way i

nte

ract

ions

had

par

ial-

η2 >

.010 m

odel

. B

lank c

ells

had

cond

itio

nal

pre

dic

tors

wit

h

par

ial-

η2 <

.010.

Colo

n (

: )

is

for

inte

ract

ions.

Ora

nge

are

pre

dic

tors

not

incl

uded

in t

he

model

. cr

=

clas

sifi

cati

on r

ate;

se=

sensi

tivit

y;

sp=

spec

ific

ity;

rex

=re

lax

ed l

asso

; lo

g=

logis

tic

regre

ssio

n;

nit

em=

num

ber

of

item

s; n

cat=

num

ber

of

item

cat

ego

ries

t; c

or=

dia

gnosi

s-te

st c

orr

elat

ion;

ss=

sam

ple

siz

e; p

rev=

pre

val

ence

70

Figure 2. Random Forest Variable Importance Measures for machine learning algorithms

with ROC classifiers. Top row is relaxed lasso logistic regression (only for conditions

with diagnosis-test correlation of .70), middle row is random forest, and bottom row is

logistic regression.

Classification rate Sensitivity Specificity

71

APPENDIX D

R SYNTAX TO GENERATE DATA

72

library(MASS)

library(mirt)

library(stringr)

#Data

N=c(1000,2000,3000) #sample size

tcor=c(.7,.8,.9) #corr

prev=c(.2,.3) #prevalence

nCat <- c(2,5) #number of categories

ld <-c(0,.3)

rep<- 50 #replications

n <- 50 #Number of items

nn1=str_pad(1:rep,3,pad='0')

#-----------------------------------

#ENVIRONMENT SET

for(ii in 1:length(N)){

for(jj in 1:length(tcor)){

for(kk in 1:length(prev)){

for(ll in 1:length(nCat)){

for(mm in 1:length(ld)){

dir1=paste0('n',N[ii],collapse='_')

dir2=paste0('cor',tcor[jj],collapse='_')

dir3=paste0('prev',prev[kk],collapse='_')

dir4=paste0('cat',nCat[ll],collapse='_')

dir5=paste0('ld',ld[mm],collapse='_')

dir6=paste0("D:/simWorld/Oscar_diss/",dir1,"/",dir2,"/",dir

3,"/",dir4,"/",dir5)

dir.create(dir6,recursive=T,showWarnings=FALSE)

#}}}}

setwd(dir6)

#=====================================

#====================================

for(rr in 1:rep){

#TRUE Theta

p1=mvrnorm(N[ii],rep(0,2),Sigma=matrix(c(1,tcor[jj],tcor[jj

],1),ncol=2,nrow=2))

colnames(p1)=c('test','diag_true')

73

p2=data.frame(p1)

cut1=qnorm(1-prev[kk],lower.tail=T)

p2$diag=ifelse(p2$diag_true<cut1,0,1)

#TRUE Theta

theta <- p2$test

if(nCat[ll]==2){

#2PL

a.1<-rnorm(n,1.7,.3)

b.1<-rnorm(n,0,1)

p <- matrix(0,N[ii],n)

u <- matrix(NA,N[ii],n)

rsave=matrix(NA,N[ii],n)

for (i in 1:N[ii]) {

for (j in 1:n) {

#Draw a random number to determine

categories

r <- runif(1, 0, 1)

rsave[i,j]=r

p[i, j] <- 1 / (1 + exp(-

a.1[j] * (theta[i] - b.1[j])))

if (r <= p[i, j]) {

u[i,j] <- 1

} else

{u[i,j] <- 0}

}}

colnames(u)=paste0("V",1:n)

truepar=cbind(a.1,b.1)

write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'),row

.names=FALSE)

#surface local dependence

if(ld[mm]==0){

p3=cbind(p2,u)

write.csv(p3,file=paste0('rep',nn1[rr],".csv"))



74

.names=FALSE)

}

else{

ld_prop=N[ii]*ld[mm]

u[1:ld_prop,1:5]=u[1:ld_prop,1]

p3=cbind(p2,u)



write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'))

}

}

###############

##CATEGORICAL##

###############

else{

#GRM

#Fixed item parameters

a.1 <- rlnorm(n, .25, .5)

b.1 <- matrix(0, n, (nCat[ll] - 1))

###Generate threhold parameters for b.1 GRM with 10 items

there are 4 thresholds for 5 categories

###Using this method is for b dist as N(0,1) since b.1[,1]

mean is -.6

###and b.1[,2] to b.1[,4] are all .2

b.1[, 1] <- rnorm(n, -.6, 1)

for(j in 1:n) {

75

b.1[j, 2] <- b.1[j,1] + runif(1, .5, .9)

b.1[j, 3] <- b.1[j,2] + runif(1, .5, .9)

b.1[j, 4] <- b.1[j,3] + runif(1, .5, .9)

}

### 5 category item responses

p <- array(0,c(N[ii],n,nCat[ll]))

pstar <- array(1,c(N[ii],n,nCat[ll]))

u <- matrix(0,N[ii],n)

for (i in 1:N[ii]) {

for (j in 1:n) {

#Draw a random number to determine

categories

r <- runif(1, 0, 1)

for (k in 2:nCat[ll]) {

pstar[i, j, k] <- 1 / (1 +

exp(-a.1[j] * (theta[i] - b.1[j, (k-1)])))

p[i,j,(k-1)] <- pstar[i, j,

(k-1)] - pstar[i, j, k]

}

p[i, j, nCat[ll]] <- pstar[i, j, 5]

#probability of last category or higher is that category

if (r <= p[i, j, 1]) {

u[i,j] <- 1

} else

if (r <= p[i,j,1] +

p[i,j,2]) {

u[i,j] <- 2

} else

if (r <= p[i,j,1] +

p[i,j,2] + p[i,j,3]) {

u[i,j] <- 3

} else

if (r <=

p[i,j,1] + p[i,j,2] + p[i,j,3] + p[i,j,4]) {

u[i

,j] <-4

76

} else

if

(r <= 1) {

u[i,j] <-5

}

}

}

colnames(u)=paste0("V",1:n)



.names=FALSE)

#surface local dependence

if(ld[mm]==0){

p3=cbind(p2,u)




.names=FALSE)

}

else{

ld_prop=N[ii]*ld[mm]

u[1:ld_prop,1:5]=u[1:ld_prop,1]

p3=cbind(p2,u)




.names=FALSE)

}

}

}

}}}}}

setwd("D:/simWorld/")

77

APPENDIX E

R SYNTAX TO ANALYZE DATA

78

library(pROC)

library(mirt)

N=c(1000,2000,3000) #sample size

tcor=c(.7,.8,.9) #corr

prev=c(.2,.3) #prevalence

nCat <- c(2,5) #number of categories

ld <-c(0,.3)

rep<- 1 #replications

n <- 50 #Number of items

counter=0

direct=list(NULL)

#-----------------------------------

#ENVIRONMENT SET

for(ii in 1:length(N)){

for(jj in 1:length(tcor)){

for(kk in 1:length(prev)){

for(ll in 1:length(nCat)){

for(mm in 1:length(ld)){

counter=1+counter

dir1=paste0('n',N[ii],collapse='_')

dir2=paste0('cor',tcor[jj],collapse='_')

dir3=paste0('prev',prev[kk],collapse='_')

dir4=paste0('cat',nCat[ll],collapse='_')

dir5=paste0('ld',ld[mm],collapse='_')

dir6=paste0("D:/simWorld/Oscar_diss/",dir1,"/",dir2,"/",dir

3,"/",dir4,"/",dir5)

dir.create(dir6,recursive=T,showWarnings=FALSE)

direct[[counter]]=dir6

}}}}}

direct2=do.call(rbind,direct)

#---------------------------------

##READ DATA##

for(q in 1:3){

79

#for(q in 4:length(direct2)){

setwd(direct2[q])

listf1=list.files(pattern="rep")

datamat=list(NULL)

for(qq in 1:length(listf1)){

datamat[[qq]]=read.csv(listf1[qq],header=TRUE)

}

listpars=list.files(pattern="truepar")

datapar=list(NULL)

for(qq in 1:length(listpars)){

datapar[[qq]]=read.csv(listpars[qq],header=TRUE)

}

#--------------------------------------

sum1=list(NULL)

counter2=0

for(nn in 1:length(datamat)){

data1=data.frame(nn)

counter2=counter2+1

training=datamat[[nn]][1:(nrow(datamat[[nn]])/2),]

testing=datamat[[nn]][((nrow(datamat[[nn]])/2)+1):nrow(data

mat[[nn]]),]

#TRUE Theta

p2=testing

n1=roc(p2$diag~p2$test)

n2=coords(n1,"all")

youden=n2[2,]+n2[3,]-1

cto1=sqrt((1-n2[2,])^2+(1-n2[3,])^2)

concor=n2[2,]*n2[3,]

c1=which(youden==max(youden))

c2=which(cto1==min(cto1))

c3=which(concor==max(concor))

c1=c1[sample(1:length(c1),1)]


80


testing$youdenpred=ifelse(testing$test<n2[1,c1],0,1)

testing$cto1pred=ifelse(testing$test<n2[1,c2],0,1)

testing$concorpred=ifelse(testing$test<n2[1,c3],0,1)

data1$tyouden=n2[1,c1]

data1$tcto1=n2[1,c2]

data1$tconcor=n2[1,c3]

#----------------------------------

#ESTIMATED THETA

library(mirt)

u=testing[,paste0("V",1:50)]

#remove constants

u=u[,apply(u, 2, var, na.rm=TRUE) != 0]

l1=mirt(u,1)

data1$conv=extract.mirt(l1,'converged')

if(extract.mirt(l1,'converged')==TRUE){

l2=fscores(l1)

testing$eap_mirt=c(l2)

n3=roc(testing$diag~testing$eap_mirt)

n4=coords(n3,"all")

estyouden=n4[2,]+n4[3,]-1

estcto1=sqrt((1-n4[2,])^2+(1-n4[3,])^2)

estconcor=n4[2,]*n4[3,]

c4=which(estyouden==max(estyouden))

c5=which(estcto1==min(estcto1))

c6=which(estconcor==max(estconcor))




testing$estyoudenpred=ifelse(testing$eap_mirt<n4[1,c4],0,1)

testing$estcto1pred=ifelse(testing$eap_mirt<n4[1,c5],0,1)

testing$estconcorpred=ifelse(testing$eap_mirt<n4[1,c6],0,1)

81

data1$col_u=ncol(u)

data1$eyouden=n2[1,c4]

data1$ecto1=n2[1,c5]

data1$econcor=n2[1,c6]

} else {

testing$eap_mirt=NA

testing$estyoudenpred=NA

testing$estcto1pred=NA

testing$estconcorped=NA

data1$col_u=ncol(u)

data1$eyouden=NA

data1$ecto1=NA

data1$econcor=NA}

############################

#item parameters

library(plyr)

par1=coef(l1,IRTpars=TRUE)

par1[[n+1]]<-NULL

par2=lapply(par1,as.data.frame)

par3=rbind.fill(par2)

datapar2<-cbind(datapar[[qq]],par3)

write.csv(datapar2,file=paste0("est_par",nn,".csv"))

############################

#----------------------------------

#ESTIMATE SUMSCORE

testing$sumscore=rowSums(u)

n5=roc(testing$diag~testing$sumscore)

n6=coords(n5,"all")

rawyouden=n6[2,]+n6[3,]-1

rawcto1=sqrt((1-n6[2,])^2+(1-n6[3,])^2)

rawconcor=n6[2,]*n6[3,]

c7=which(rawyouden==max(rawyouden))

c8=which(rawcto1==min(rawcto1))

c9=which(rawconcor==max(rawconcor))

82




testing$rawyoudenpred=ifelse(testing$sumscore<n6[1,c7],0,1)

testing$rawcto1pred=ifelse(testing$sumscore<n6[1,c8],0,1)

testing$rawconcorpred=ifelse(testing$sumscore<n6[1,c9],0,1)

data1$ryouden=n6[1,c7]

data1$rcto1=n6[1,c8]

data1$rconcord=n6[1,c9]

#-----------------------------

#TREE

#Decide on training/testing datasets

library(tree)

u2=training[,paste0("V",1:50)]

tr1=tree(as.factor(training$diag)~.,data=u2,control=tree.co

ntrol(nrow(u2),mindev=.0001))

#plot(tr1)

#text(tr1,pretty=0)

cvt=cv.tree(tr1,K=10)

#plot(cvt)

cvt2=which(cvt$dev==min(cvt$dev))

cvt2=cvt2[sample(1:length(cvt2),1)]

prtr=prune.tree(tr1,best=cvt$size[cvt2])

#plot(prtr)

#text(prtr,pretty=0)

#prevent single nodes

if(cvt$size[cvt2]>1){

tr2=predict(prtr,newdata=u)

testing$treepred=ifelse(tr2[,2]<.5,0,1)

} else{

testing$treepred=rep(0,nrow(testing))}

ll1=prtr$frame[,1]

est1=unique(as.numeric(gsub("\\D", "", ll1)))

83

est1=est1[!is.na(est1)]

tree_chose=matrix(0,nrow=1,ncol=50)

colnames(tree_chose)=paste0("treeV",1:50)

tree_chose[est1]=1

data1=cbind(data1,tree_chose)

#--------------------------

#RandomForests

library(randomForest)

rf1=randomForest(y=as.factor(training$diag),x=u2)

testing$rfpred=predict(rf1,newdata=u)

imp1=importance(rf1)

k1=sort(importance(rf1),decreasing=TRUE,index=TRUE)

k2=cbind(k1$ix,1:50)

rf_chose=matrix(0,nrow=1,ncol=50)

colnames(rf_chose)=paste0("rfV",1:50)

rf_chose[k2[,1]]=k2[,2]

data1=cbind(data1,rf_chose)

#---------------------------

#LOGISTIC LASSO

library(glmnet)

#glmmod<-

glmnet(x=as.matrix(u2),y=as.factor(training$diag),alpha=1,f

amily='binomial')

cv.glmmod <-

cv.glmnet(x=as.matrix(u2),y=as.factor(training$diag),

alpha=1,family='binomial')

plot(cv.glmmod,main=direct2[[q]])

best.lambda<-cv.glmmod$lambda.1se

lasso_prob <- predict(cv.glmmod,newx =

as.matrix(u2),s=best.lambda,type='response')

k1=ifelse(lasso_prob<.5,0,1)

testing$lassopred=c(k1)

l11=coef(cv.glmmod,s=best.lambda)

las_select=which(l11!=0)-1

las=las_select[-1]

84

lasso_chose=matrix(0,nrow=1,ncol=50)

colnames(lasso_chose)=paste0("lassoV",1:50)

lasso_chose[las]=1

data1=cbind(data1,lasso_chose)

#####

sum1[[nn]]=data1

write.csv(testing,file=paste0("testing",nn,".csv"))

}

sum2=do.call(rbind,sum1)

write.csv(sum2,file=paste0('summary',qq,'.csv'))

}

85

APPENDIX F

SUPPLEMENTAL WRITE-UP OF SIMULATION RESULTS

86

This section is divided into seven parts. The first part covers the estimation and

parameter recovery of the IRT models. The second part discusses the classification

accuracy of psychometric methods. The third part covers the estimation of the machine

learning models. The fourth part goes over the classification accuracy of the machine

learning models using the Bayes classifier (minimizing prediction error). The fifth part

discusses the classification accuracy of machine learning models using the ROC

classifier. The sixth part compares the psychometric and machine learning methods for

classification. Finally, the seventh part covers the recovery of the data-generating theta by

the items chosen by the machine learning models.

Clarifications. In this section, the term theta refers to the latent variable score

(person parameter) estimated in IRT analysis. The term classification accuracy refers to

classification rates, sensitivity, and specificity, jointly. However, each outcome is

analyzed individually using both linear regression and binary recursive partitioning. In

each analysis, there is a possibility that a model does not converge or that a model does

not assign cases to the minority class. Conditions analyzed were chosen to obtain a fully-

crossed design. At the beginning of the analysis of each model, there is a description of

the conditions analyzed for each model, and at the end of the section there are brief

conclusions about the analysis. Unless stated otherwise, regression models included all

possible predictors and predictor interactions for prediction. Only effects with partial-

η2>.01 were interpreted. The predictors were treated as categorical variables, so the

regression coefficients reported are for the group most different from the reference group.

For example, the predictor of diagnosis-test correlation has three values (.30, .50, .70),

87

and in this case the reference group in the model is .30 and the regression coefficient

reported is from the group of .70.

1. Estimation of the Psychometric Methods

IRT Convergence. There were 129 models (out of 108,000 models; 216

conditions, 500 replications per condition) that did not converge. Table 1.1 and Table 1.2

suggest that the number of models that did not converge come from conditions with

either binary items, ten items, or a sample size of 250.

Theta parameter recovery. Table 1.3 and Table 1.4 show the mean squared

error (MSE) between the data-generating theta and the estimated theta. On average, MSE

decreased as the number of categories increased. For conditions with five-category items,

MSE decreased as sample size increased. Table 1.5 and Table 1.6 suggest that the

correlation between the estimated theta and the data-generating theta increased as the

number of item categories and the number of items increased.

Item parameter recovery. Table 1.7, 1.8, 1.9, and 1.10 show the mean squared

error (MSE) and the variance explained between the true and estimated item parameters

for the 2PL model. MSE seemed to decrease and variance explained seemed increase as

sample size and number of items increased and local dependence decreased. Table 1.11,

1.12, 1.13, 1.14, 1.15, 1.16, 1.17, 1.18, 1.19, and 1.20 show the MSE and the variance

explained between the true and estimated item parameters for the GRM model. MSE

seemed to decrease and variance explained seemed increase as sample size and number

of items increased and local dependence decreased.

88

Brief Summary. Convergence and parameter recovery information suggest that

there were no apparent estimation problems. All of the conditions are used to compare

psychometric models.

2. Classification Accuracy of the Psychometric Models

Effect of ROC index. For the psychometric models, the cut scores in the ROC

analysis were determined by the Youden index, closest-to-(0,1) criterion, and the

concordance probability. However, the results from these three indices were not

practically different from each other. Regression models predicting the difference in

classification rate, sensitivity, and specificity from simulation factors did not explain a lot

of variance (R2=.002 - .009). Therefore, the results for the psychometric models are

presented based on the Youden index.

Classification accuracy with data-generating theta. Table 2.1 and 2.2 show

that classification rates ranged from .60 to .80; Table 2.3 and 2.4 show that sensitivity

ranged from .59 to .83; and Table 2.5 and 2.6 show that specificity ranged between .59 to

.80. Classification accuracy seems to increase as the diagnosis-test correlation increased.

Linear Regression. In a regression predicting classification rate of the data-

generating theta from number of items, number of item categories, diagnosis-test

correlation, prevalence, sample size, and local dependence, the variance explained by the

predictors was R2=.387. Classification rates increased as the diagnosis-test correlation

increased (b=.177, s.e. =.005, t=34.429, p<.001, partial-η2=.375) and as the prevalence

decreased (b=-.024, s.e.=.005, p=-4.670, partial-η2=.029). In a regression predicting

sensitivity from the simulation factors, the variance explained by the predictors was

R2=.317. Sensitivity increased as the diagnosis-test correlation increased (b=.185, s.e.

89

=.007, t=26.710, p<.001, partial-η2=.312). In a regression predicting specificity from the

simulation factors, the variance explained by the predictors was R2=. 277. Specificity


p<.001, partial-η2=.264) and as the prevalence decreased (b=-.025, s.e.=.006, t=-3.860,

p<.001, partial-η2=.022).

Regression Trees. A regression tree was grown on the training dataset to predict

classification accuracy of the data-generating theta, but the algorithm only made splits

based on the diagnosis-test correlation, so the regression tree is not presented. In a

random forest model grown in a training dataset to predict classification accuracy of the

data-generating theta, the most important variables were the diagnosis-test correlation

and prevalence (see Figure 2.1). Using the random forest model to predict classification

rate in the testing dataset, the MSE was .007, and the variance explained was .385. Also,

the MSE in the prediction of sensitivity was .013, and the variance explained was .315.

Finally, MSE in the prediction of specificity was .011 and the variance explained was

.274.

Conclusion. Important predictors of classification rates, sensitivity, and

specificity were diagnosis-test correlation and prevalence. This model is the baseline for

the following analyses.

Classification accuracy with estimated theta. Table 2.7 and 2.8 show that

classification rates ranged from .58 to .80; Table 2.9 and 2.10 show that sensitivity

ranged from .58 to .82; and Table 2.11 and 2.12 show that specificity ranged for .59 to

.79. Classification accuracy seemed to increase as the diagnosis-test correlation

increased.

90

Linear Regression. In a regression predicting classification rate of the estimated

theta from number of items, item categories, diagnosis-test correlation, prevalence,

sample size, and local dependence, the variance explained by the predictors was R2=.337.



s.e.=.005, t=-5.887, p<.001, partial-η2=.021). In a regression predicting sensitivity from

the simulation factors, the variance explained by the predictors was R2=.280. Sensitivity


p<.001, partial-η2=.272). In a regression predicting specificity from the simulation

factors, the variance explained by the predictors was R2=.236. Specificity increased as the


η2=.220) and as the prevalence decreased (b=-.039, s.e.=.007, t=-5.793, p<.001, partial-

η2=.017).

Regression Trees. A regression tree was grown on the training dataset to predict

classification accuracy of the estimated theta, but the algorithm only made splits based on

the diagnosis-test correlation, so the regression tree is not presented. In a random forest

model grown in a training dataset to predict the classification accuracy of the estimated

IRT theta, the most important predictor was diagnosis-test correlation, followed by

prevalence (see Figure 2.2). Using the random forest model to predict classification rate

in the testing dataset, MSE was .007, and the variance explained was .334. Also, MSE in

the prediction of sensitivity was .014, and the variance explained was .281. Finally, the

MSE in the prediction of specificity was .012, and the variance explained was .233.

91


specificity of the estimated theta were diagnosis-test correlation and prevalence.

Estimated theta also had comparable classification accuracy as data-generating theta.

Classification accuracy of the raw summed score. Table 2.13 and 2.14 show

that classification rates ranged from .57 to .79; Table 2.15 and 2.16 show that sensitivity

ranged from .59 to .83; and Table 2.17 and 2.18 show that specificity ranged from .57 to

.79. Classification accuracy seemed to increase with as diagnosis-test correlation

increased.

Linear Regression. In a regression predicting classification rate of the raw

summed score from number of items, item categories, diagnosis-test correlation,

prevalence, sample size, and local dependence, the variance explained by the predictors

was R2=.320. Classification rates increased as the diagnosis-test correlation increased

(b=.143, s.e. =.006, t=26.040, p<.001, partial-η2=.306) and as the prevalence decreased

(b=-0.018, s.e.=.006, t=-3.202, p=.001, partial-η2=.015). In a regression predicting

sensitivity from the simulation factors, the variance explained by the predictors was

R2=.281. Sensitivity increased as the diagnosis-test correlation increased (b=.182, s.e.

=.007, t=24.973, p<.001, parial-η2=.274). In a regression predicting specificity from the

simulation factors, the variance explained by the predictors was R2=.223. Specificity

increased as the diagnosis-test correlation increased (b=.142 s.e. =.007, t=20.423,

p<.001, partial-η2=.208) and as the prevalence decreased (b=-.024, s.e.= .007, t=-3.472,

p<.001, partial-η2=.013).

Regression Tree. A regression tree was grown on the training dataset to predict

classification accuracy of the raw summed scores, but the algorithm only made splits

92

based on the diagnosis-test correlation, so the regression tree is not presented. In a

random forest model grown in a training dataset to predict the classification accuracy of

the raw summed scores, the most important variables were diagnosis-test correlation and

prevalence (see Figure 2.3). Using the random forest model to predict classification rate

in the testing dataset, the MSE was .008, and the variance explained was .317. Also, MSE

in the prediction of sensitivity was .014, and the variance explained was .281. Finally, the

MSE in the prediction of specificity was .013, and the variance explained was .219.


specificity of the raw summed score were diagnosis-test correlation and prevalence. The

raw summed score also had comparable classification accuracy as data-generating theta.

Comparing classification accuracy across psychometric models. The

diagnosis-test correlation and prevalence were the only predictors used in this analysis.

Results suggest that there were no practical differences in classification accuracy between

using data-generating theta and the estimated theta as a function of simulation factors

(classification rate R2 = .003; sensitivity R2 = .001; and specificity R2 = .002). Also, there

were no practical differences in classification accuracy between using data-generating

theta and the raw summed scores as a function of simulation factors (classification rate R2

= .004; sensitivity R2 = .001; and specificity R2 = .003). Finally, there were no practical

difference in classification accuracy between using the estimated theta and the raw

summed scores as a function of simulation factors (classification rate R2 = .001;

sensitivity R2 = .001; and specificity R2 = .001).

Brief Summary. Across psychometric models, the best predictors of

classification rates, sensitivity, and specificity were the diagnosis-test correlation and

93

prevalence. Specifically, classification accuracy increased as the diagnosis-test

correlation increased and prevalence decreased. Also, there was not a significant

difference in classification accuracy among using data-generating theta, estimated theta,

or raw summed scores. Therefore, models using the data-generating theta are used in

subsequent analyses.

3. Estimation of the Machine Learning Methods

Model building. This section discusses conditions where machine learning

models only assigned participants to the majority class. There were two ways to

determine class assignment in machine learning models. First, cases could be assigned to

the class they are most likely to belong (greater than 50% probability of belonging to a

class) to reduce prediction error. This method is referred to as the Bayes classifier. The

second type of class assignment uses ROC curves to determine a probability threshold for

class assignment to balance sensitivity and specificity (as in the psychometric models).

As previously mentioned, only conditions with at least 50% of models assigning cases to

the minority class were investigated and discussed.

CART with Bayes classifier. Table 3.1 and 3.2 show that at least 80% of the

models with 5% prevalence did not assign cases to the minority class. Also, at least 54%

of the models in conditions with N=250 did not assign participants to the minority class.

Assignment to the minority class increased as prevalence, number of items, and number

of categories increased. To estimate the full-factorial ANOVA, only conditions with 20%

prevalence and N=500 or N=1000 were analyzed.

Random Forest with Bayes classifier. Table 3.3 and 3.4 show that at least 65% of

the models in conditions with a 5% prevalence and 10 binary items did not assign cases

94

the minority class. Similar patterns were found in conditions with a 5% prevalence and

30 items, but only when r = .30. At least 50% of the models in conditions with 10%

prevalence and 10 binary items did not assign cases to the minority class when sample

size was 500 or 1,000 and the diagnosis-test correlation was .30 or .50. Otherwise, in

conditions with binary items, the number of models that assigned cases to the minority

class increased as sample size, diagnosis-test correlation, number of items, and

prevalence increased. For five-category items, the number of models that predicted the

minority class increased as sample size, diagnosis-test correlation, and prevalence

increased and as the number of items decreased. To estimate the full-factorial ANOVA,

only conditions with prevalence of 20% were analyzed.

Lasso Logistic Regression with Bayes classifier. Table 3.5 and 3.6 suggest that

most models did not assign cases to the minority class. Conditions where at least 50% of

models assigned cases to the minority class had polytomous items, a sample size of 500

or 1,000, prevalence of 20%, and a diagnosis-test correlation of .70. In conditions with

binary items, models assigned cases to the minority class when there was 20%

prevalence, 30 items, diagnosis-test correlation of .70, and a sample size of 1,000. To

estimate the full-factorial ANOVA, the eight conditions analyzed had at least 500 people,

20% prevalence, five-category items, and a diagnosis-test correlation of .7.

Relaxed Lasso Logistic Regression with Bayes classifier. Table 3.7 and 3.8 show

that at least 50% of conditions with a diagnosis-test correlation of .30 did not assign cases

to the minority class. Also, most conditions that did not assign cases to the minority class

had binary items or 10 items. To estimate a full-factorial ANOVA, conditions with five-

95

category items and a diagnosis-test correlation of .70 were analyzed (even though in four

out of 36 conditions at least 50% of the models did not assign cases to the minority class).

Logistic Regression with Bayes classifier. Table 3.9 and 3.10 show that at least

50% of models did not assign cases to the minority class in conditions with 10 binary

items. For conditions with 10 five-category items, at least 50% of the models did not

assign cases to the minority class in conditions with a sample size of 500 or 1,000 and

diagnosis-test correlation of .30 or .50. For conditions with 30 items, most conditions had

at least 50% of models assign cases to the minority class except when there were binary

items, a sample size of 1,000, prevalence of 5%, and diagnosis-test correlation of .30. To

estimate the full-factorial ANOVA, only conditions with 30 items were analyzed (even

though in two out of 108 conditions at least 50% of the models did not assign cases to the

minority class).

Random Forest with ROC classifier. There were no problematic conditions for

models using random forest with a ROC classifier, so all conditions were analyzed.

Lasso Logistic Regression with ROC classifier. Table 3.11 and 3.12 show that at

least 50% of models did not assign cases to the minority class in conditions with a

diagnosis-test correlation of .30. All of the conditions with a diagnosis-test correlation of

.70 had greater than a 50% probability of assigning cases to the minority class. To

estimate the full-factorial ANOVA, only conditions with a diagnosis-test correlation of

.70 were analyzed.

Relaxed Lasso Logistic Regression with ROC classifier. Exact same patterns as

from the Lasso Logistic Regression using a ROC classifier were observed, so no Tables

are presented.

96

Logistic Regression with ROC classifier. There were no problematic conditions

for models using logistic regression with a ROC classifier, so all conditions were

analyzed.

Brief Summary. There are many conditions where machine learning models do

not assign cases to the minority class. Most conditions analyzed include conditions with

high diagnosis-test correlation, high prevalence, high sample size, and 30 five-category

items. Assignment to the minority class increases when models use a ROC classifier

instead of a Bayes classifier.

4. Classification Accuracy of the Machine Learning Models using the Bayes

classifier

CART classification accuracy with Bayes classifier. Only conditions with 20%

prevalence and N=500 or N=1000 were analyzed. Table 4.1 suggests that classification

rate for CART ranges between .70 and .80, sensitivity ranges between .07 and .36, and

specificity ranges between .83 and .95. Classification accuracy seems to increase as the

diagnosis-test correlation increases, except for specificity in conditions with 10 items,

where specificity decreased as the diagnosis-test correlation increased.

Linear Regression. In the regression predicting classification rate from the

simulation factors, the variance explained was R2=.858. There was a significant three-

way interaction between number of items, number of item categories, and diagnosis-test

correlation (b=-.032, s.e.= .002, t=13.899, p<.001, partial-η2= .049). Across conditions,

classification rates decreased as the number of items and number of categories increased,

and as the diagnosis-test correlation decreased. As the diagnosis-test correlation

increased, the difference in classification rate due to number of items and number of item

97

categories decreased. In the regression predicting sensitivity from the simulation factors,

the variance explained was R2=.608. There was a significant three-way interaction

between number of items, number of item categories, and diagnosis-test correlation (b=-

.076, s.e. = .001, t=7.506, p<.001, partial-η2= .025). Across conditions, sensitivity

increased as the number of items, number of item categories, and the diagnosis-test

correlation increased. As the diagnosis-test correlation increased, the difference in

sensitivity across number of items and number of item categories decreased. In the

regression predicting specificity from the simulation factors, the variance explained was

R2=.608. There was a significant three-way interaction between number of items, number

of item categories, and diagnosis-test correlation (b=-.059, s.e. = .004, t=-12.761,

p<.001, partial-η2= .025). On average, specificity increased as the diagnosis-test

correlation increased, and as the number of items and number of item categories

decreased. As the diagnosis-test correlation increased, the difference in specificity across

number of items and number of item categories decreased.

Regression Trees. The regression trees grown in a training dataset to predict

classification rate, sensitivity, and specificity from simulation factors are presented in

Figures 4.1, 4.2, and 4.3. For classification rate, the first split was on the diagnosis-test

correlation. If the case was in a condition with a diagnosis-test correlation of .70, the

predicted classification rate was .80. If diagnosis-test correlation was lower than .70,

predicted classification rate dependent on number of items and number of item

categories. Conditions with binary items had the higher predicted classification rate than

conditions with five-category items. For sensitivity, the first split was also diagnosis-test

correlation. If the case was in a condition with a diagnosis-test correlation of .70, the

98

predicted sensitivity was .35. If the diagnosis-test correlation was lower than .70,

predicted sensitivity depended on number of items and number of item categories.

Conditions with more items and more item categories had higher predicted sensitivity

than the complement. For specificity, the first split was also diagnosis-test correlation. If

the case was in a condition with diagnosis-test correlation of .70, the predicted specificity

was .91. If the diagnosis-test correlation was lower than .70, predicted specificity

depended on the number of items, and then on the number of categories. Models in

conditions with 10 binary items had higher specificity than conditions with 30 five-

category items. In a random forest model grown in a training dataset to predict the

classification accuracy of classification and regression trees, the most important predictor

was diagnosis-test correlation, followed by number of items and number of item

categories (see Figure 4.4). Using the random forest model to predict classification rate in

the testing dataset, the MSE was .001, and the variance explained was .820. Also, MSE in

the prediction of sensitivity was .007, and the variance explained was .580. Finally, the

MSE in the prediction of specificity was .001 and the variance explained was .652.

Conclusion. Important predictors for classification rates, sensitivity, and

specificity were the diagnosis-test correlation, number of items, and number of item

categories. However, these results only apply when prevalence is at least 20% and sample

sizes are 500 or 1,000. Compared to the results from the data-generating model,

specificity is inflated and sensitivity is too low.

Random Forest classification accuracy with Bayes classifier. Only conditions

with prevalence of 20% were analyzed. Table 4.2 suggest that classification rate for

random forest ranged around .79 and .83; Table 4.3 show that sensitivity ranges between

99

.01 and .35; and Table 4.4 show that specificity ranges between .93 and .99.

Classification accuracy seems to increase as the diagnosis-test correlation and number of

items increase.

Linear Regression. In the regression predicting classification rate from the

simulation factors, the variance explained was R2=.759. There was a significant three-

way interaction between number of items, number of item categories, and diagnosis-test

correlation (b=-.013, s.e. = .001, t=13.135, p<.001, partial-η2= .017). On average,

classification rates increased as the number of items, number of item categories, and the

diagnosis-test correlation increased. As the number of item categories increased,

classification rate increased for conditions with 30 items and for conditions with at least a

diagnosis-test correlation of .5. On the other hand, classification rate decreased as the

number of item categories increased for conditions with 10 items and a diagnosis-test

correlation of .3. In the regression predicting sensitivity from the simulation factors, the

variance explained was R2=.823. There were two significant two-way interactions: the

interaction between number of items and number of item categories (b=.025, s.e. = .005,

t=4.914, p<.001, partial-η2= .019), and the interaction of number of items and diagnosis-

test correlation (b=-.049, s.e. = .005, t=-9.563, p<.001, partial-η2= .049). On average,

sensitivity increased as the number of items, number of categories, and the diagnosis-test

correlation increased. As the number of items increased, sensitivity increased for

conditions with two item categories, but slightly decreased for conditions with five-

category items. Also, as the number of items increased, sensitivity increased at a faster

rate in conditions with higher diagnosis-test correlation than for conditions with a lower

diagnosis-test correlation. In the regression predicting specificity from the simulation

100

factors, the variance explained was R2=.509. There was a significant three-way

interaction between number of items, number of item categories, and diagnosis-test

correlation (b=-.022, s.e. = .002, t=-9.742, p<.001, partial-η2= .013). On average,

specificity was not influenced by number of categories; specificity increased as the

number of items increased; and specificity decreased as the diagnosis-test correlation

increased. As the number of item categories increased, specificity increased for

conditions with 30 items, but decreased for conditions with 10 items. On the other hand,

as the number of item categories increased, specificity increased for conditions with a

diagnosis-test correlation of .7, but decreased for conditions with either a diagnosis-test

correlation of .3 or .5.

Regression Trees. The regression tree grown in a training dataset to predict

classification rate is presented in Figure 4.5. For classification rate, the first split was

diagnosis-test correlation. If the case was in a condition with a diagnosis-test correlation

of .70, the predicted classification rate depended on number of items and number of item

categories, where the highest predicted classification rate (.83) was in conditions with 30

five-category items. If the diagnosis-test correlation was lower than .70, the predicted

classification rate also depended on number of items and number of item categories,

where the lowest predicted classification (.78) was in conditions with 10 five-category

items. For sensitivity, the tree only made splits based on diagnosis-test correlation, so the

tree is not presented. For sensitivity, the first split was based on diagnosis-test correlation

(see Figure 4.6). If the diagnosis-test correlation had a value of .30, then the predicted

specificity depended on sample size, number of item categories, and number of items, but

it was higher than in conditions with diagnosis-test correlation of .50 or .70. In a random

101

forest model grown in a training dataset to predict the classification rates, the most

important variable was diagnosis-test correlation, followed by number of items (see

Figure 4.7, left pane). Using the random forest model to predict classification rate in the

testing dataset, the MSE was .001, and the variance explained was .742. In the model

predicting sensitivity, the most important variable was diagnosis-test correlation (see

Figure 4.7, center pane). The MSE in the prediction of sensitivity was .007, and the

variance explained was .785. Finally, in the model predicting specificity, the most

important variable was diagnosis-test correlation, followed by number of items (see

Figure 4.7, right pane). The MSE in the prediction of specificity was .001, and the

variance explained was .480.


specificity were the diagnosis-test correlation, number of items, and number of item

categories. However, these results only apply when prevalence is 20%. Compared to

results from the data-generating model, specificity was inflated and sensitivity was too

low.

Lasso Logistic Regression classification accuracy with Bayes classifier. The

eight conditions analyzed had at least 500 people, 20% prevalence, five-category items,

and a diagnosis-test correlation of .7. Table 4.4A shows that classification rate ranged

between .81 and .82; sensitivity ranged between .13 and .18; and specificity ranged

between .98 and .99. Classification accuracy seems to increase with sample size and

number of items.

Linear Regression. In a regression predicting classification rate from sample size,

number of items, and local dependence, the variance explained was R2=.096. There were

102

significant main effects of sample size (b=.002, s.e. = .001, t=3.456, p=.001, partial-η2=

.045) and number of items (b=.002, s.e. = .001, t=3.438, p=.001, partial-η2= .052), so

classification rate increased as sample size and number of items increased. In a regression

predicting sensitivity from sample size and local dependence, the variance explained was

R2=.061. There was a significant main effect of sample size (b=.012, s.e. = .005, t=2.283,

p=.022, partial-η2= .033) and a non-significant effect of number of items, but with

partial-η2 greater than .01 (b=.009, s.e. = .005, t=1.732, p=.083, partial-η2= .026). In this

case, sensitivity increased as sample size and number of items increased. In a regression

predicting specificity from sample size and local dependence, the variance explained was

R2=.020. There was a nonsignificant main effect of sample size, but the partial-η2 was

greater than .01 (b=-.001, s.e. = .001, t=-.987, p=.324, partial-η2= .012), so specificity

decreased as sample size increased.

Regression Trees. The regression tree grown in the training dataset to predict

classification rate had two splits, but they all predicted a classification rate of .82. For

sensitivity, the first split was sample size (see Figure 4.8). If sample size was 500, the

predicted sensitivity was .14. If sample size was 1,000, the predicted sensitivity depended

on the number of items. In conditions with 10 items, the predicted sensitivity was .15,

and in conditions with 30 items, the predicted sensitivity was .18. For specificity, the

regression tree only had a split on sample size, so the regression tree is not presented. In a

random forest model grown in a training dataset to predict the classification rates, the

most important variable was number of items, followed by sample size (see Figure 4.9,

left panel). Using the random forest model to predict classification rate in the testing

dataset, the MSE was .001 and the variance explained was .086. In the model predicting

103

sensitivity, the most important variables were sample size and number of items (see

Figure 4.9, center panel). The MSE in the prediction of sensitivity was .005, and the

variance explained was .057. Finally, in the model predicting specificity, the most

important variable was sample size, followed by number of items (see Figure 4.9, right

panel). The MSE in the prediction of specificity was .001, and the variance explained was

.018.


specificity were sample size and number of items. However, these results only apply in

conditions with five-category items, prevalence of 20%, diagnosis-test correlation of .70,

and sample size of 500 or 1,000. Compared to the data-generating model, specificity was

inflated and sensitivity was too low.

Relaxed Lasso Logistic Regression classification accuracy with Bayes

classifier. Only conditions with five-category items and a diagnosis-test correlation of

.70 were analyzed. Table 4.5 shows that classification rates ranged from .81 to .95; Table

4.6 shows that sensitivity ranged from .08 to .39; and Table 4.7 shows that specificity

ranged between .90 and .99. Classification rate and specificity seemed to decrease as

prevalence increased, and sensitivity increased as prevalence and number of items

increased.


number of items, prevalence, and local dependence, the variance explained was R2=.970.

There was a significant interaction between prevalence and number of items (b=.002, s.e.

= .001, t=5.317, p<.001, partial-η2= .020), and a significant main effect of sample size

(b=.006, s.e. = .001, t=5.106, p<.001, partial-η2= .120). On average, classification rates

104

increased as sample size and number of items increased and as prevalence decreased. The

differences in classification rates across prevalence decreased as the number of items

increased. In a regression predicting sensitivity from sample size, number of items,

prevalence, and local dependence, the variance explained was R2=.700. There was a

significant interaction between sample size and prevalence (b=.043, s.e. = .004,

t=11.833, p<.001, partial-η2= .038), and a significant main effect of number of items

(b=.026, s.e. = .005, t=4.925, p<.001, partial-η2= .104). On average, sensitivity increased

as prevalence and number of items increased, and sensitivity decreased as sample size

increased. As prevalence increased, the differences in sensitivity across sample size

decreased. In a regression predicting specificity from sample size, number of items,

prevalence, and local dependence, the variance explained was R2=.719. There were

significant main effects of sample size (b=.012, s.e. = .001, t=10.610, p<.001, partial-η2=

.110), and prevalence (b=-.029, s.e. = .001, t=-46.416, p<.001, partial-η2= .702), and a

nonsignificant effect of number of items, but with a partial-η2 > .01, (b=-.002, s.e. = .001,

t=-1.779, p=.075, partial-η2= .036). On average, specificity increased as sample size

increased, and specificity decreased as number of items and prevalence increased.

Regression Trees. The regression tree grown on the training dataset to predict

classification rate only made splits based on prevalence, so the regression tree is not

presented. For sensitivity, the first split was on prevalence (see Figure 4.10). If the case

was in a condition with a prevalence of 20%, predicted sensitivity depended on the

number of items, where the condition with more items had the highest predicted

sensitivity (.39). If the case was in a condition with a prevalence of 5%, predicted

sensitivity depended on sample size, where lower sample size had higher predicted

105

sensitivity. If the case was in a condition with a prevalence of 10%, predicted sensitivity

depended on number of items, where higher number of items led to higher predicted

sensitivity. For specificity, the first split was on prevalence (see Figure 4.11), where the

condition with 5% prevalence had the highest specificity (.99). For conditions with 20%

prevalence, specificity depended on sample size, where conditions with higher sample

size had higher predicted specificity. In a random forest model grown in a training dataset

to predict the classification accuracy of relaxed lasso logistic regression, the most

important variable was prevalence (see Figure 4.12). Using the random forest model to

predict classification rate in the testing dataset, the MSE was .001, and the variance

explained was .952. Also, MSE in the prediction of sensitivity was .005, and the variance

explained was .705. Finally, the MSE in the prediction of specificity was .001 and the


Conclusion. The most important predictor for classification rates, sensitivity, and

specificity was prevalence. However, these results only apply in conditions where the

diagnosis-test correlation is .70 and there are five-category items. Compared to results

from the data-generating model, specificity was inflated and sensitivity was too low.

Logistic Regression classification accuracy with Bayes classifier. Only

conditions with five-category items were analyzed. Table 4.8 shows that classification

rate ranged between .86 and .95; Table 4.9 shows that sensitivity ranged between .01 and

.41; and Table 4.10 shows that specificity ranged between .90 and .99. Classification rate

and specificity increased as sample size increased, and decreased as prevalence increased.

Sensitivity increased as diagnosis-test correlation, prevalence, item categories increased

and decreased as sample size increased.

106


number of items, diagnosis-test correlation, prevalence, and local dependence, the

variance explained was R2=.963. There were significant interactions between prevalence

and diagnosis-test correlation (b=.046, s.e. = -.001, t=44.603, p<.001, partial-η2= .315),

and between sample size and prevalence (b=-.003, s.e. = -.001, t=-2.571, p=.010, partial-

η2= .078). On average, classification rate increased as sample size and the diagnosis-test

correlation increased, and classification rate decreased as prevalence increased. As the

diagnosis-test correlation increased, the differences in classification rates across

prevalence decreased. Also, as sample size increased, the differences in classification

rates across prevalence increased. In a regression predicting sensitivity from sample size,


variance explained was R2=.854. There was a significant three-way interaction between

prevalence, sample size, and diagnosis-test correlation (b=.166, s.e. = -.007, t=24.084,

p<.001, partial-η2= .047). On average, sensitivity increased as prevalence and diagnosis-

test correlation increased, and sensitivity decreased as sample size increased. As the

diagnosis-test correlation increased, the differences in sensitivity across prevalence

increased. Also, as prevalence increased, the differences in sensitivity across sample sizes

decreased. In a regression predicting specificity from sample size, number of items,

diagnosis-test correlation, prevalence, and local dependence, the variance explained was

R2=.759. There was a significant three-way interaction between prevalence, sample size,

and diagnosis-test correlation (b=.166, s.e. = -.007, t=24.084, p<.001, partial-η2= .035).

On average, specificity increased as sample size increased, and specificity decreased as

prevalence and diagnosis-test correlation decreased. As prevalence increased, the

107

differences in specificity across sample size decreased. Also, as prevalence increased, the

differences in specificity across diagnosis-test correlation increased.

Regression Trees. The regression tree grown in a training dataset to predict


prevalence. If the case was in a condition with prevalence of 20%, classification rate

depended in the diagnosis-test correlation. Conditions with a diagnosis-test correlation of

.70 had a predicted classification rate of .80. However, in conditions with lower

diagnosis-test correlation the predicted classification rate depended on sample size, where

conditions with higher sample sizes had higher predicted classification rates. On the other

hand, for conditions with prevalence of 5% or 10%, the predicted classification rate

depended on sample size, where higher sample sizes had higher predicted classification

rate. For sensitivity, the first split was diagnosis-test correlation (see Figure 4.14). For

conditions with a diagnosis-test correlation of .70, the predicted sensitivity depended on

prevalence, where conditions with higher prevalence had the highest predicted sensitivity

(.39). On the other hand, conditions with diagnosis test-correlation of .30 or .50 and high

prevalence had higher predicted sensitivity than the complement. For specificity, the first

split was sample size, and then prevalence (see Figure 4.15). For conditions with sample

size of 500 or 1,00 and prevalence of 5% or 10%, specificity was at least .97. On the

other hand, conditions with a sample size of 250 and prevalence of 20% had the lowest

specificity with .90. In a random forest model grown in a training dataset to predict the

classification rates, the most important variable was prevalence (see Figure 4.16, left

panel). Using a random forest model to predict classification rate in the testing dataset,

the MSE was .001, and the variance explained was .932. In the model predicting

108

sensitivity, the most important variable was diagnosis-test correlation, followed by

prevalence and sample size (see Figure 4.16, center panel). The MSE in the prediction of

sensitivity was .006, and the variance explained was .790. Finally, in the model

predicting specificity, the most important variable was diagnosis-test correlation (see

Figure 4.16, right panel). The MSE in the prediction of specificity was .001, and the


Conclusion. The most important predictors for classification rates, sensitivity, and

specificity were prevalence, diagnosis-test correlation, and sample size. However, these

results only apply in conditions with five-category items. Compared to results from the

data-generating model, specificity was inflated and sensitivity was too low.

Brief Summary. Previous results suggest that machine learning methods that

minimize prediction error (using a Bayes classifier) have inflated specificity and very low

sensitivity. Therefore, it is not recommended to use these models when researchers are

interested in the prediction of a diagnosis that rarely occurs. Given their performance,

machine learning methods with the Bayes classifier are no longer discussed.

5. Classification Accuracy of the Machine Learning Models using ROC classifiers

In this section, the random forest model, logistic regression, lasso logistic

regression, and relaxed lasso logistic regression are analyzed. CART with a ROC

classifier was not carried out because CART yields the same probability to each case in

the node, limiting the number of possible probability thresholds during the ROC analysis.

Effect of ROC index. Similar to the psychometric models, the ROC probability

thresholds for class assignment were determined by the Youden index, closest-to-(0,1)

criterion, and the concordance probability. For random forest, there were significant

109

differences in classification rate (R2=.149-.150), sensitivity (R2 = .122-.124), and

specificity (R2 =.134-.135) as a function of simulation factors, across the three ROC

indices. For logistic regression, there were small differences in classification rate

(R2=.017-.022) and specificity (R2 =.014-.016), and small to medium differences in

sensitivity (R2 = .016-.137) as a function of simulation factors, across the three ROC

indices. For lasso logistic regression, there were small differences in classification rate

(R2=.008-.017), sensitivity (R2 = .003-.006), and specificity (R2 =.004-.005) as a function

of simulation factors, across the three ROC indices. Finally, for the relaxed lasso logistic

regression, there were small differences in classification rate (R2=.006-.014), sensitivity

(R2 = .003-.005), and specificity (R2 =.003-.005) as a function of simulation factors,

across the three ROC indices. In these analyses, the Youden index had the highest

sensitivity across the vast majority of conditions. Results from section 3 suggest that

there is low sensitivity in the machine learning models with Bayes classifier, so the

results for section 4 are presented are based on the Youden index to increase sensitivity.

Random Forest classification accuracy with ROC classifier. Table 4.11 and

4.12 show that classification rate ranged between .49 and .76; Table 4.13 and 4.14 show

that sensitivity ranged between .45 and .82; and Table 4.15 and 4.16 show that specificity

ranged between .46 and .84. Classification accuracy seemed to increase as diagnosis-test

correlation, number of item categories, and number of items increased.


number of items, number of item categories, diagnosis-test correlation, prevalence, and

local dependence, the variance explained was R2=.517. There were several significant

three-way interactions: the interaction between number of items, number of item

110

categories, and prevalence (b=-.173, s.e. = -.010, t=-16.258, p<.001, partial-η2= .047),

the interaction between number of items, number of item categories, and the diagnosis-

test correlation (b=-.108, s.e. = -.010, t=-10.121, p<.001, partial-η2= .011), and the

interaction between sample size, number of items, and number of item categories

(b=.221, s.e. = .010, t=20.640, p<.001, partial-η2= .014). On average, classification rates

increased as the diagnosis-test correlation and the sample size increased, classification

rate decreased as prevalence decreased, and classification rates did not seem influenced

by number of items or number of item categories. However, as the number of items

increased, differences in classification rates across sample size and prevalence decreased,

and the differences in classification rates across diagnosis-test correlations increased.

Also, as the number of item categories increased, the classification rate for conditions

with 10 items decreased, and the classification rate for conditions with 30 items

increased. In a regression predicting sensitivity from sample size, number of items,

diagnosis-test correlation, prevalence, and local dependence, the variance explained was

R2=.466. There was a significant interaction between number of items, number of item

categories, and prevalence (b=.229, s.e. = .015, t=15.140, p<.001, partial-η2= .054), a

significant interaction between sample size, number of items, and number of item

categories (b=-.262, s.e. = .015, t=-17.280, p<.001, partial-η2= .019), and a main effect

of diagnosis-test correlation (b=.185, s.e. = .008, t=24.421, p<.001, partial-η2= .224). On

average, sensitivity increased as the diagnosis-test correlation, sample size, and number

of items increased, and sensitivity decreased as prevalence increased. However, as the

number of items and item categories increased, differences in sensitivity across sample

size and prevalence decreased. In a regression predicting specificity from sample size,

111


variance explained was R2=.443. There was a significant interaction between number of

items, number of item categories, and prevalence (b=-.203, s.e. = .013, t=-15.170,

p<.001, partial-η2= .035), a significant interaction between sample size, number of items,

and number of item categories (b=.247, s.e. = .013, t=18.413, p<.001, partial-η2= .013),

and a main effect of diagnosis-test correlation (b=.099, s.e. = .007, t=14.689, p<.001,

partial-η2= .339). On average, specificity increased as the diagnosis-test correlation, and

sample size increased, and specificity decreased as prevalence increased. However, as the

number of items and item categories increased, differences in specificity across sample

size and prevalence decreased.

Regression Tree. The regression tree grown in a training dataset to predict

classification rate only made splits based on the diagnosis-test correlation, so the tree is

not presented. For sensitivity, the first split was diagnosis-test correlation. For conditions

with a diagnosis-test correlation of .70, predicted sensitivity was higher for conditions

with higher number of items, higher item categories, and higher prevalence than the

complement. For conditions with a diagnosis-test correlation of .50 or .30, predicted

sensitivity depended on number of items, where conditions with five category-items had

a higher sensitivity than conditions with binary items. For specificity, the tree only made

splits based on the diagnosis-test correlation, so the tree is not presented. Using a random

forest model to predict classification rate in the testing dataset, the most important

variable was diagnosis-test correlation (see Figure 4.18, left panel). In the model

predicting classification rate, the MSE was .007, and the variance explained was .500. In

the model predicting sensitivity, the most important variable was also diagnostic-test

112

correlation, followed closely by number of items and number of item categories (see

Figure 4.18, center panel). The MSE in the prediction of sensitivity was .015, and the

variance explained was .449. In the model predicting specificity, the most important

variable was also diagnostic-test correlation (see Figure 4.19, right panel). The MSE in

the prediction of specificity was .011, and the variance explained was .426.

Conclusion. Important predictors for classification rate and specificity was

diagnosis-test correlation, while the important predictors for sensitivity were the

diagnosis-test correlation, number of items, number of item categories, and prevalence.

All of the classification accuracy indices were close to those of the data-generating

model.

Logistic Regression classification accuracy with ROC classifier. Table 4.17

and 4.18 show that classification rate ranged between .52 and .81; Table 4.19 and 4.20

show that sensitivity ranged between .54 and .82; and Table 4.21 and 4.22 show that

specificity ranged between .51 and .75. Classification accuracy seemed to increase as

diagnosis-test correlation increased, and decreased as prevalence increased.

Linear Regression. In a regression predicting classification rate from simulation

factors, the variance explained was R2=.517. There was a significant main effect of

diagnosis-test correlation (b=.165, s.e. = .004, t=40.118, p<.001, partial-η2= .498), and

sample size (b=.034, s.e. = .004, t=8.140, p<.001, partial-η2= .011). On average,

classification rates increased as the diagnosis-test correlation and the sample size

increased. In a regression predicting sensitivity from simulation factors, the variance

explained was R2=.441. There was a significant three-way interaction between sample

size, prevalence, and diagnosis-test correlation (b=.165, s.e. = .004, t=40.118, p<.001,

113

partial-η2= .010), and a three-way interaction between sample size, prevalence, and

number of items (b=.034, s.e. = .004, t=8.140, p<.001, partial-η2= .014). On average,

sensitivity increased as the diagnosis-test correlation and the sample size increased, and

decreased as the number of items and prevalence decreased. As the number of items

increased, the difference in sensitivity across sample size and prevalence decreased. Also,

as the diagnosis-test correlation increased, the difference in sensitivity across sample size

and prevalence increased. In a regression predicting specificity from simulation factors,

the variance explained was R2=.417. There was a significant main effect of diagnosis-test

correlation (b=.165, s.e. = .005, t=32.831, p<.001, partial-η2= .392), and a non-

significant effect of prevalence, but with a partial-η2 > .01 (b=.008, s.e. = .005, t=1.511,

p=.131, partial-η2= .010). On average, specificity increased as the diagnosis-test

correlation increased, and specificity decreased as prevalence increased.

Regression Tree. The regression tree grown in a training dataset to predict

classification rate only made splits based on the diagnosis-test correlation, so the tree is

not presented. For sensitivity, the first split was diagnosis-test correlation. For conditions

with a diagnosis-test correlation of .30, predicted sensitivity was .60. For conditions with

diagnosis-test correlation of .50, predicted sensitivity was .69. For conditions with

diagnosis-test correlation of .70, predicted sensitivity depended on sample size, number

of items, and prevalence, where conditions with high sample sizes had higher sensitivity,

regardless of number of items or prevalence. For specificity, the tree only made splits

based on the diagnosis-test correlation, so the tree is not presented. Using a random forest

model to predict classification accuracy in the testing dataset, the most important variable

was diagnosis-test correlation (see Figure 4.20). In the model predicting classification

114

rate, the MSE was .004 and the variance explained was .509. The MSE in the prediction

of sensitivity was .008 and the variance explained was .436. The MSE in the prediction of

specificity was .006 and the variance explained was .409.

Conclusion. The most important predictor of classification accuracy was

diagnosis-test correlation. Also, the classification accuracy indices were close to those of

the data-generating model.

Lasso Logistic Regression classification accuracy with ROC classifier. Only

conditions with a diagnosis-test correlation of .7 were analyzed. Table 4.23 shows that

classification rate ranged between .69 and .76; Table 4.24 shows that sensitivity ranged

between .73 and .82; and Table 4.25 show that specificity ranged between .68 and .75.

Classification accuracy seemed to increase as sample size and number of items increased,

and decrease as prevalence increases.


number of item categories, number of items, prevalence, and local dependence, the

variance explained was R2=.100. There were significant main effects of number of item

categories (b=.011, s.e. = .003, t=3.551, p<.001, partial-η2= .016), sample size (b=.011,

s.e. = .003, t=3.582, p<.001, partial-η2= .037), number of items (b=.009, s.e. = .003,

t=2.868, p=.004, partial-η2= .028), and prevalence (b=-.016, s.e. = .003, t=-5.408,

p<.001, partial-η2= .021). On average, classification rates increased as sample size,

number of items, and number of item categories increased, and classification rates

decreased as prevalence increased. In a regression predicting sensitivity from sample

size, number of item categories, number of items, prevalence, and local dependence, the


115



t=3.339, p<.001, partial-η2= .013), and prevalence (b=-.037, s.e. = .004, t=-9.658,

p<.001, partial-η2= .090). On average, sensitivity increased as sample size, number of

items, and number of item categories increased, and sensitivity decreased as prevalence

increased. In a regression predicting specificity from sample size, number of item

categories, number of items, prevalence, and local dependence, the variance explained

was R2=.083. There were significant main effects of number of item categories (b=.015,

s.e. = .004, t=3.780, p<.001, partial-η2= .011), sample size (b=.022, s.e. = .004, t=5.752,

p<.001, partial-η2= .024), number of items (b=.013, s.e. = .004, t=3.339, p<.001, partial-

η2= .020), and prevalence (b=-.037, s.e. = .004, t=-9.658, p<.001, partial-η2= .028). On

average, specificity increased as sample size, number of items, and number of item

categories increased, and sensitivity decreased as prevalence increased.


classification rate is presented in Figure 4.21. For classification rate, the first split was on

sample size, followed by a split on number of items. Conditions with higher sample size

and higher number of items had a higher classification rate. For sensitivity, the tree only

made splits based on prevalence, so the tree is not presented. For specificity, the first split

was prevalence (see Figure 4.22). If prevalence was 5%, the predicted specificity was .73.

On the other hand, if prevalence was 10% or 20%, prevalence depended on sample size

and number of items, where high sample size and high number of items had high

predicted specificity. In a random forest model grown in a training dataset to predict the

classification rate, the most important predictor was sample size, followed closely by

116

number of items (Figure 4.23, left panel). Using the random forest model to predict

classification rate in a testing dataset, the MSE was .002, and the variance explained was

.101. For sensitivity, the most important variable was prevalence (Figure 4.23, center

panel). The MSE in the prediction of sensitivity was .003, and the variance explained was

.132. For specificity, the most important variables were prevalence and sample size,

followed by number of items and number of item categories (Figure 4.23, right panel.

The MSE in the prediction of specificity was .003 and the variance explained was .081.


specificity were prevalence, sample size, and number of items. However, these results

can only be expanded to conditions with a diagnosis-test correlation of .70. Also, the

classification accuracy indices were close to those of the data-generating model.

Relaxed Lasso Logistic Regression classification accuracy with ROC

classifier. Only conditions with a diagnosis-test correlation of .7 were analyzed. Table

4.26 shows that classification rate ranged between .70 and .75; Table 4.27 showed that

sensitivity ranged between .73 and .82; and Table 4.28 showed that specificity ranged

between .69 and .76. Classification accuracy seemed to increase as sample size, number

of items, and number of item categories increased, and classification accuracy decreased

as prevalence increased.


number of item categories, number of items, prevalence, and local dependence, the




117

t=3.837, p<.001, partial-η2= .032), and prevalence (b=-.015, s.e. = .003, t=-5.063,

p<.001, partial-η2= .031). On average, classification rates increased as sample size,

number of items, and number of item categories increased, and classification rates

decreased as prevalence increased. In a regression predicting sensitivity from sample

size, number of item categories, number of items, prevalence, and local dependence, the




t=2.408, p=.016, partial-η2= .012), and prevalence (b=-.035, s.e. = .004, t=-9.306,

p<.001, partial-η2= .082). On average, sensitivity increased as sample size, number of

items, and number of item categories increased, and sensitivity decreased as prevalence

increased. In a regression predicting specificity from sample size, number of item

categories, number of items, prevalence, and local dependence, the variance explained

was R2=.096. There were significant main effects of number of item categories (b=.014,

s.e. = .004, t=3.691, p<.001, partial-η2= .016), sample size (b=.010, s.e. = .004, t=3.031,

p=.002, partial-η2= .019), number of items (b=.012, s.e. = .004, t=3.158, p=.002, partial-

η2= .022), and prevalence (b=-.019, s.e. = .004, t=-5.346, p<.001, partial-η2= .038). On

average, specificity increased as sample size, number of items, and number of item

categories increased, and specificity decreased as prevalence increased.



number of items. If the number of items was 30, then predicted classification rate

depended on prevalence and sample size, where conditions with 5% prevalence had the

118

highest predicted classification rate (.75). On the other hand, if the number of items was

10, then predicted classification rate depended on number of item categories, where the

predicted classification rate in conditions with five-category items had a higher predicted

classification rate than in conditions with two-category items. For sensitivity, the first

split was prevalence. If prevalence was 20%, the predicted sensitivity was .76. If

prevalence was 5% or 10%, sensitivity depended on sample size, where conditions with

5% prevalence and a sample size of 500 or 1,000 had the highest sensitivity (.81). For

specificity, the first split was prevalence. If prevalence was 5%, then the predicted

specificity was .74. On the other hand, if prevalence was 10% or 20%, then specificity

depended on number of items, where the predicted specificity was higher for conditions

with 30 items than with 10 items. In a random forest model grown in a training dataset to

predict the classification rate, the most important variables were number of items, sample

size, prevalence, and number of item categories (see Figure 4.27, left pane). Using the

random forest model to predict classification rate in the testing dataset, the MSE was

.002, and the variance explained was .110. For sensitivity, the most important variable

was prevalence, followed by sample size (see Figure 4.27, center pane). The MSE in the

prediction of sensitivity was .003, and the variance explained was .137. For specificity,

the most important variable was prevalence, followed by number of items, number of

item categories, and sample size. The MSE in the prediction of specificity was .002, and

the variance explained was .091.

Conclusion. The important variables to predict classification accuracy were

number of items, sample size, prevalence, and number of item categories. However, these

119

results can only be expanded to conditions with a diagnosis-test correlation of .70. Also,

the classification accuracy indices were close to those of the data-generating model.

Comparing classification accuracy across machine learning models with

ROC classifier. For comparing lasso logistic regression and the relaxed lasso logistic

regression, only conditions with a diagnosis-test correlation of .70 were analyzed. There

were no practical differences in classification accuracy between lasso logistic regression

and relaxed lasso logistic regression (classification rate R2=.004; sensitivity R2=.004;

specificity R2=.004). Therefore, only estimates from the relaxed lasso are used in further

analysis.

There were differences between relaxed lasso logistic regression and logistic

regression on classification rates (R2=.122), sensitivity (R2=.348), and specificity

(R2=.111) as a function of simulation factors. In the prediction of differences in

classification rate, there was a significant three-way interaction between sample size,

prevalence, and number of items (b=-.074, s.e. = .007, t=-11.208, p<.001, partial-

η2=.041). On average, logistic regression had higher classification rates in conditions

with low prevalence, small sample sizes, and small number of items, and relaxed lasso

logistic regression had higher classification rates in conditions with high prevalence,

larger sample sizes, and large number of items. As the number of items increased, the

difference in classification rate increased in favor of relaxed lasso logistic regression

faster for conditions with larger sample size than for conditions with smaller sample size.

Also, as the number of items increased, the difference in classification rate favored

relaxed lasso logistic regression for conditions with sample sizes of 500 or 1,000, but the

difference in classification rate favored logistic regression in conditions with sample sizes

120

of 250. In the prediction of differences in sensitivity, there was a significant three-way

interaction between number of items, sample size, and prevalence (b=.155, s.e. = .010,

t=15.315, p<.001, partial-η2=.085). On average, differences in sensitivity approached

zero as sample size and prevalence increased, and difference increased in favor of the

relaxed lasso as the number of items increased. As the number of items increased,

differences in sensitivity in favor of the relaxed lasso increased at a faster rate for

conditions with a sample size of 250 and prevalence of 5% than for any other sample size

or prevalence. In the prediction of differences in specificity, there was a significant three-

way interaction between sample size, prevalence, and number of items (b=-.086, s.e. =

.008, t=-10.623, p<.001, partial-η2=.037). On average, the difference in specificity

approached zero as prevalence and sample size increased. Also, specificity was greater

for logistic regression in conditions with 10 items, but specificity was greater for relaxed

lasso logistic regression in conditions with 30 items. As the number of items increased,

the difference in specificity approached zero for conditions with prevalence of .10 or .20,

but the difference increased in favor of logistic regression in conditions with prevalence

of .05. Similarly, as the number of items increased, the difference in specificity

approached zero for conditions with a sample size of 500 and 1,000, but the difference in

specificity increased in favor of logistic regression in conditions with sample size of 250.

Also, there were differences between relaxed lasso logistic regression and random

forest on classification rates (R2=.120), sensitivity (R2=.393), and specificity (R2=.111).

In the prediction of differences in classification rate, there was a significant three-way

interaction between number of item categories, prevalence, and number of items (b=-

.074, s.e. = .007, t=-11.208, p<.001, partial-η2=.025). On average, relaxed lasso had

121

higher classification rates in conditions with five-category items, 10% or 20%

prevalence, or 10 items, while random forest had higher classification rates in conditions

with two-category items, 5% prevalence, or 30 items. As the number of items increased,

the difference in classification rates across prevalence decreased. Also, as the number of

items increased, conditions with two-category items favored the relaxed lasso, while

conditions with five-category items favored random forest. In the prediction of

differences in sensitivity, there was a significant four-way interaction between sample

size, number of item categories, prevalence, and number of items (b=-.216, s.e. = .019,

t=-11.508, p<.001, partial-η2=.010). On average, the differences in sensitivity in favor of

relaxed lasso logistic regression increased as sample size increased and as number of

items, number of item categories, and prevalence decreased. As the number of items

increased, the differences in sensitivity across prevalence, number of item categories, and

number of items decreased. In the prediction of differences in specificity, there was a

significant three-way interaction between number of item categories, prevalence, and

number of items (b=.044, s.e. = .010, t=4.192, p<.001, partial-η2=.019). On average,

relaxed lasso had higher specificity in conditions with five-category items and 10% or

20% prevalence, while random forest had higher specificity in conditions with two-

category items and 5% prevalence. Number of items did not seem to influence the

difference in specificity. As the number of items increased, the difference in specificity

across prevalence decreased. Also, as the number of items increased, conditions with

two-category items favored the relaxed lasso, while conditions with five-category items

favored random forest.

122

All of the conditions were analyzed to study the difference in classification

accuracy between logistic regression and random forest. There were differences in the

classification rate (R2=.184), sensitivity (R2=.351), and specificity (R2=.170). In the

prediction of differences in classification rate, there was a significant three-way

interaction between number of item categories, prevalence, and number of items (b=.164,

s.e. = .013, t=12.839, p<.001, partial-η2=.030) and a three-way interaction between

number of item categories, number of items, and sample size (b=-.228, s.e. = .013, t=-

17.703, p<.001, partial-η2=.013). On average, differences in classification rates in favor

of logistic regression increased as prevalence and number of items increased, and as

sample size and number of item categories decreased. As the number of items increased,

the difference in classification rates across prevalence decreased. Also, as the number of

items increased, conditions with two-category items favored logistic regression, while

conditions with five-category items favored random forest. Also, as the number of items

increased, the differences in classification rates for conditions in sample size of 250 and

500 decreased, and the differences in classification rates for conditions in sample size of

1,000 increased in favor of logistic regression. In the prediction of differences in

sensitivity, there was a significant three-way interaction between number of item

categories, prevalence, and number of items (b=-.197, s.e. = .018, t=-10.780, p<.001,

partial-η2=.030) and a three-way interaction between number of item categories, number

of items, and sample size (b=.287, s.e. = .018, t=15.616, p<.001, partial-η2=.021). On

average, differences in sensitivity in favor of logistic regression increased as sample size

increased, and decreased as number of items, number of item categories, and prevalence

increased. As the number of items increased, the differences in sensitivity favoring

123

logistic regression across prevalence, number of item categories, and sample size,

decreased, to the point of slightly favoring random forest. In the prediction of differences

in specificity, there was a significant three-way interaction between number of item

categories, prevalence, and number of items (b=.193, s.e. = .016, t=12.143, p<.001,

partial-η2=.023) and a three-way interaction between number of item categories, number

of items, and sample size (b=-.254, s.e. = .016, t=-15.902, p<.001, partial-η2=.013). On

average, differences in specificity in favor of logistic regression increased as number of

items and prevalence increased, and decreased as sample size and prevalence increased.

As the number of items increased, differences specificity across prevalence and number

of item categories decreased. Also, as the number of items increased, the differences in

specificity for conditions in sample size of 250 and 500 decreased, and the differences in

specificity for conditions in sample size of 1,000 increased in favor of logistic regression.

6. Model Comparison across Psychometric and Machine Learning Methods.

Results from section 4 suggest that machine learning models that reduce

prediction error (using a Bayes classifier) have inflated specificity and very low

sensitivity compared to the data-generating model. Given the poor performance, machine

learning models with a Bayes classifier are not compared to data-generating theta.

Results from section 2 suggest that there is not a significant difference between using the

data-generated theta, estimated theta, or a raw summed score for classification. Thus, this

section compares the differences in classification accuracy between the data-generating

theta and the machine learning methods with ROC classifiers.

Random Forest with ROC classifier vs. Data-generating model. The vast

majority of conditions had higher classification accuracy for the data-generating model

124

than the random forest model. Table 6.1 and 6.2 show that the difference between models

ranged between -.09 to .01; Table 6.3 and 6.4 show that sensitivity ranged from -.30 to

.02; and Table 6.5 and 6.6 show that specificity ranged from -.11 to .03.

In a regression predicting the difference in classification rate between the data-

generating thetas and random forest with a ROC classifier from simulation factors, the

variance explained was R2=.156. There were three significant interactions: the interaction

between prevalence and number of item categories (b=.139, s.e. = .010, t=14.136,

p<.001, partial-η2= .012), the interaction of number of item categories and number of

items (b=.186, s.e. = .010, t=18.906, p<.001, partial-η2= .038), and the interaction of

prevalence and number of items (b=.176, s.e. = .010, t=17.910, p<.001, partial-η2=

.019). On average, the difference in classification rates increased as the number of items

increased and number of item categories decreased. Also, the difference in classification

rates was larger for a prevalence rate of .10 than prevalence rates of .05 and .20. For

binary items, the difference in classification rates across method was greater for

conditions with 30 items than with 10 items, but the opposite pattern was found for

polytomous items. Also, as prevalence increased, the difference in classification rates

decreased for conditions with 30 items, but increased in conditions with 10 items. In a

regression predicting the difference in sensitivity between the data-generating thetas and

random forest with a ROC classifier from simulation factors, the variance explained was

R2=.270. There was a significant three-way interaction between prevalence, number of

items, and number of item categories (b=.226, s.e. = .002, t=11.405, p<.001, partial-η2=

.032), and a three-way interaction between sample size, number of items, and number of

categories (b=-.230, s.e. = .020, t=-11.516, p<.001, partial-η2= .011). On average,

125

differences in sensitivity rates increased as sample size increased, and differences in

sensitivity deceased as prevalence, number of items, and number of item categories

increased. For conditions with 10 items, the difference in sensitivity increased as sample

size increased, and the difference in sensitivity decreased as prevalence and number of

categories increased. On the other hand, the difference of sensitivity in conditions with 30

items did not seem to be influenced by predictors. In a regression predicting the

difference in specificity between the data-generating thetas and random forest with a

ROC classifier from simulation factors, the variance explained was R2=.140. There was a

significant three-way interaction between prevalence, number of items, and number of

item categories (b=-.203, s.e. = .018, t=-11.583, p<.001, partial-η2= .020). On average,

the difference in specificity increased as number of item categories increased, and it was

not influenced by the number of items. Also, the difference in specificity was larger for

conditions with a prevalence rate of .10 than prevalence rates of .05 and .20. However,

for conditions with five-category items, differences in specificity decreased as the

number of items and prevalence increased. On the other hand, for conditions with two-

category items, differences in specificity increased as the number of items and prevalence

increased.

Logistic Regression with ROC classifier vs. Data-generating model. The vast

majority of conditions had higher classification rate for the data-generating model than

the random forest model. Table 6.7 and 6.8 show that the difference between models

ranged between -.09 to .01; Table 6.9 and 6.10 show that sensitivity ranged from -.30 to

.02; and Table 6.11 and 6.12 show that specificity ranged from -.11 to .03.

126

In the regression predicting the difference in classification rate from simulation

factors, the variance explained was R2=.067. None of the predictors had a partial-η2 > .01.

In the regression predicting the difference in sensitivity from the simulation factors, the

variance explained was R2=.037. None of the predictors had a partial-η2 > .01. Finally, in

the regression predicting the difference in sensitivity from the simulation factors, the

variance explained was R2=.028. None of the predictors had a partial-η2 > .01.

Lasso Logistic Regression with ROC classifier vs. Data-generating model.

Conditions examined had a diagnosis-test correlation of .70. The vast majority of

conditions had higher classification rate for the data-generating model than the random

forest model. Table 6.13 shows that the difference between models ranged between -.10

to -.02; Table 6.14 shows that sensitivity ranged from -.04 to .01; and Table 6.15 shows

that specificity ranged from -.11 to -.02.

In the regression predicting the difference in classification rate from the

diagnosis-test correlation, prevalence, sample size, number of items, number of item

categories, and local dependence, the variance explained was R2=.085. Differences in

classification rate decreased as sample size (b=.022, s.e. = .005, t=4.772, p<.001, partial-

η2= .029), prevalence (b=.034, s.e. = .005, t=7.204, p<.001, partial-η2= .042), and

number of items (b=.008, s.e. = .005, t=1.592, p=.111, partial-η2= .012) increased. In the

regression predicting the difference in sensitivity from the simulation factors, the

variance explained was R2=.013. None of the predictors had a partial-η2 > .01. Finally, in

the regression predicting the difference in specificity from the simulation factors, the

variance explained was R2=.058. Differences in specificity decreased as sample size

127

(b=.022, s.e. = .005, t=4.772, p<.001, partial-η2= .021) and prevalence (b=.022, s.e. =

.005, t=4.772, p<.001, partial-η2= .026) increased.

Relaxed Lasso Logistic Regression with ROC classifier vs. Data-generating

model. Conditions examined had a diagnosis-test correlation of .70. The vast majority of

conditions had higher classification rate for the data-generating model than the random

forest model. Table 6.16 shows that the difference between models ranged between -.10

to .01; Table 6.17 shows that sensitivity ranged from -.04 to .01; and Table 6.18 show

that specificity ranged from -.10 to -.02.

In the regression predicting the difference in classification rate from the

diagnosis-test correlation, prevalence, sample size, number of items, number of item

categories, and local dependence, the variance explained was R2=.082. Differences in

classification rate decreased as sample size (b=.023, s.e. = .005, t=4.964, p<.001, partial-

η2= .025), prevalence (b=.036, s.e. = .005, t=7.583, p<.001, partial-η2= .036), and

number of items (b=.011, s.e. = .005, t=2.160, p=.031, partial-η2= .015) increased. None

of the predictors had a partial-η2 > .01. In the regression predicting the difference in

sensitivity from the simulation factors, the variance explained was R2=.055. None of the

predictors had a partial-η2 > .01. Finally, in the regression predicting the difference in

specificity from the simulation factors, the variance explained was R2=.028. Differences

in specificity decreased as sample size (b=.024, s.e. = .006, t=4.103, p<.001, partial-η2=

.018) and prevalence (b=.034, s.e. = .006, t=5.719, p<.001, partial-η2= .022) increased.

7. Scoring Machine Learning Items for Person Parameter Recovery

The CART algorithm and lasso logistic regression select the most important items

to predict a diagnosis. This section investigates the recovery of theta if the remaining

128

items in these models were to be scored using the estimated parameters from the IRT

model in section 2. An IRT EAP[θ] score was estimated per model if at least two items

were chosen by the algorithms. On the other hand, the random forest algorithm does not

do variable selection, but ranks predictors in terms of variable importance. In this case,

half of the items with the highest variable importance were used to estimate an IRT

EAP[θ] score.

Theta Recovery by CART. Only conditions with prevalence greater than 5% and

a sample size greater than 250 were analyzed.

Linear Regression. In regression predicting theta MSE from simulation factors,

the variance explained was R2=.761. There was a significant three-way interaction

between diagnosis-test correlation, number of items, and number of item categories (b=-

.074, s.e. = .007, t=-9.964, p<.001, partial-η2= .018), an interaction between diagnosis-

test correlation and prevalence (b=-.022, s.e. = .005, t=-4.379, p<.001, partial-η2= .012),

and an interaction between diagnosis-test correlation and sample size (b=-.056, s.e. =

.005, t=-11.251, p<.001, partial-η2= .053). On average, theta MSE decreased as number

of item categories, prevalence, and sample size increased, and theta MSE increased as the

diagnosis-test correlation and the number of items increased. When the diagnosis-test

correlation was .70, theta MSE was higher for conditions with 30 items, prevalence of

.10, or sample size of 500 than for conditions with either 10 items, prevalence of .20, or

sample size of 1,000. Similarly, in a regression predicting the correlation between the true

and estimated theta from the simulation factors, the variance explained was R2=.761.

There was a significant three-way interaction between diagnosis-test correlation, number

of items, and number of item categories (b=.074, s.e. = .009, t=8.478, p<.001, partial-

129

η2= .014), a two-way interaction between diagnosis-test correlation and prevalence

(b=.029, s.e. = .006, t=4.918, p<.001, partial-η2= .011), and a diagnosis-test correlation

and sample size (b=.071, s.e. = .006, t=12.318, p<.001, partial-η2= .052). On average,

the correlation between true and estimated theta increased as number of item categories,

prevalence and sample size increase, and as the diagnosis-test correlation and number of

items decreased. When diagnosis-test correlation was .70, the correlation between true

and estimated theta was higher for conditions with either a sample size of 1,000, 30

items, 10% prevalence, or five-category items than for conditions with a sample size of

500, 10 items, 20% prevalence, or two-category items.

Regression Trees. The regression trees grown in a training dataset to predict theta

MSE and the correlation between true and estimated theta are presented in Figures 7.1

and 7.2. For theta MSE, the first split was diagnosis-test correlation. Conditions with a

diagnosis-test correlation of .30 or .50 had a predicted theta MSE of .005. For conditions

with a diagnosis-test correlation of .70, predicted theta MSE depended on number of

items and sample size, where conditions with 30 items had a higher predicted MSE than

conditions with 10 items. Similarly, for the correlation between true and estimated theta,

the first split was diagnosis-test correlation. Conditions with diagnosis-test correlation of

.30 or .50, the predicted correlation between true and estimated theta was .99. For

conditions with a diagnosis-test correlation of .70, the predicted correlation between true

and estimated thetas depended on number of items and sample size, where conditions

with 10 items had a higher correlation between true and estimated thetas than conditions

with 30 items. In a random forest model grown in a training dataset to predict theta MSE

for CART, the most important predictor was diagnosis-test correlation, followed by

130

number of items (see left panel of Figure 7.3). Using the random forest model to predict

theta MSE in the testing dataset, the MSE was .003, and the variance explained was .768.

Also, in the prediction of the correlation between true and estimated theta, the most

important predictor was diagnosis-test correlation, followed by number of items. The

MSE in the prediction of of the correlation between true and estimated theta was .004,

and the variance explained was .754.

Theta Recovery by Lasso Logistic Regression. Only conditions with a

diagnosis-test correlation of .70 were analyzed.

Linear Regression. In regression predicting theta MSE by sample size,

prevalence, number of item categories, number of items, and local dependence, the

variance explained was R2=.491. There was a significant interaction between prevalence

and number of item categories (b=.014, s.e. = .007, t=1.998, p=.045, partial-η2= .011),

and main effects of sample size (b=-.077, s.e. = .005, t=-15.477, p<.001, partial-η2=

.292) and number of items (b=.080, s.e. = .005, t=14.581, p<.001, partial-η2= .056). On

average, theta MSE increased as the number of items increased, and decreased as the

sample size, prevalence, and number of item categories increased. Also, as prevalence

increased, the differences in theta MSE across number of item categories decreased.

Similarly, in regression predicting the correlation between true and estimated theta by

sample size, prevalence, number of item categories, number of items, and local

dependence, the variance explained was R2=.498. There was a significant interaction

between prevalence and number of item categories (b=.022, s.e. = .008, t=2.738, p=.006,

partial-η2= .017), and main effects of sample size categories (b=.102, s.e. = .006,

t=17.909, p<.001, partial-η2= .297) and number of items (b=-.050, s.e. = .006, t=-8.061,

131

p<.001, partial-η2= .013). On average, the correlation between true and estimated theta

increased as sample size, prevalence, and number of item categories increased, and

decreased as the number of items increased. Also, as prevalence increased, the

differences in the correlation between true and estimated theta across number of items

decreased.



and 7.5. For theta MSE, the first split was on sample size. For conditions with a sample

size of 250, theta MSE depended on the number of item categories and prevalence, where

conditions with five-category items and prevalence of 20% had a lower predicted MSE

than conditions with two-category items or conditions with prevalence of 10%. For

conditions with sample size of 500 or 1,000, theta MSE depended on prevalence, number

of item categories, and sample size, where conditions with high sample size and high

prevalence led to conditions with lower MSE. Similarly, for the correlation between true

and estimated theta, the first split was sample size. For conditions with a sample size of

250, the correlation between true and estimated theta depended on the number of item

categories and prevalence, where conditions with five-category items and higher

prevalence led to higher predicted correlation between true and estimated theta. For

conditions with a sample size of 500 or 1,000, correlation between true and estimated

theta depended on prevalence and number of item categories, where conditions with

higher prevalence or conditions with five-category items had higher predicted correlation

between true and estimated theta. In a random forest model grown in a training dataset to

predict theta MSE for lasso logistic regression, the most important predictors were

132

sample size and prevalence, followed by number of item categories and number of items

(see left panel of Figure 7.6). Using the random forest model to predict theta MSE in the

testing dataset, the MSE was .005, and the variance explained was .476. Also, in the

prediction of the correlation between true and estimated theta, the most important

predictor sample size and prevalence, followed by number of item categories and number

of items (see right panel of Figure 7.6). The MSE in the prediction of the correlation

between true and estimated theta was .007, and the variance explained was .481.

Theta Recovery by Random Forest. All conditions were analyzed.

Linear Regression. In regression predicting theta MSE by sample size,

prevalence, diagnosis-test correlation, number of item categories, number of items, and

local dependence, the variance explained was R2=.713. There was a significant

interaction between prevalence and diagnosis-test correlation (b=-.019, s.e. = .002, t=-

10.325, p<.001, partial-η2= .017), and main effects of number of items (b=-.060, s.e. =

.001, t=-46.983, p<.001, partial-η2= .627) and number of item categories (b=-.036, s.e. =

.001, t=-28.146, p<.001, partial-η2= .404). On average, theta MSE increased as the

diagnosis-test correlation increased, and decreased as number of items, prevalence, and

number of item categories increased. Also, as diagnosis-test correlation increased, the

differences in theta MSE across prevalence increased. Similarly, in regression predicting

the correlation between true and estimated theta by simulation factors, the variance

explained was R2=.771. There was a significant interaction between prevalence and

diagnosis-test correlation (b=.014, s.e. = .007, t=1.998, p=.045, partial-η2= .015), and a

significant interaction between number of items and number of item categories (b=-.077,

s.e. = .005, t=-15.477, p<.001, partial-η2= .034). On average, the correlation between

133

true and estimated theta increased as the number of items, prevalence, and number of

item categories increased, and decreased as the diagnosis-test correlation increased. As

diagnosis-test correlation increased, the differences in the correlation between true and

estimated theta across prevalence increased. Also, as the number of items increased, the

difference in the correlation between true and estimated theta across number of item

categories increased.



and 7.8. For theta MSE, the first split was on the number of items, followed by number of

item categories. Conditions with higher number of items and higher number of item

categories had a lower predicted MSE than conditions with lower number of items and

lower number of item categories. Similarly, for the correlation between true and

estimated theta, the first split was number of items, followed by number of item

categories. Conditions with higher number of items and higher number of item categories

had a higher predicted correlation between true and estimated thetas than conditions with

lower number of items and lower number of item categories. In a random forest model

grown in a training dataset to predict theta MSE for random forest, the most important

predictor was number of items, followed by number of item categories (see left panel of

Figure 7.9). Using the random forest model to predict theta MSE for random forests in

the testing dataset, the MSE was .001, and the variance explained was .710. Also, in the

prediction of the correlation between true and estimated theta, the most important

predictor was number of items, followed by number of item categories (see right panel of

134

Figure 7.9). The MSE in the prediction of of the correlation between true and estimated

theta was .001, and the variance explained was .767.

135

Table 1.1

IRT nonconvergence (out of 500) in the 10-item condition

Item Categories

2 5

Prevalence

.05 .10 .20 .05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 1 0 1 1 1 1 0 0 2 0 0 0

250 .5 4 2 3 1 1 0 0 0 0 0 0 1

250 .7 1 1 4 1 2 3 0 2 0 0 0 1

500 .3 0 0 0 0 0 2 0 0 0 0 0 0

500 .5 0 0 0 0 0 0 0 0 0 0 0 0

500 .7 0 0 0 0 1 1 0 0 0 0 0 0

1000 .3 0 0 1 1 0 0 0 2 0 0 1 1

1000 .5 0 0 0 0 0 0 0 1 0 0 0 0

1000 .7 0 0 0 0 0 0 0 1 0 0 1 0

Note: N=training sample size; r=diagnosis-test correlation

136

Table 1.2

IRT nonconvergence (out of 500) in the 10-item condition

Item Categories

2 5

Prevalence

.05 .10 .20 .05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 1 0 1 1 1 1 0 0 2 0 0 0

250 .5 4 2 3 1 1 0 0 0 0 0 0 1

250 .7 1 1 4 1 2 3 0 2 0 0 0 1

500 .3 0 0 0 0 0 2 0 0 0 0 0 0

500 .5 0 0 0 0 0 0 0 0 0 0 0 0

500 .7 0 0 0 0 1 1 0 0 0 0 0 0

1000 .3 0 0 1 1 0 0 0 2 0 0 1 1

1000 .5 0 0 0 0 0 0 0 1 0 0 0 0

1000 .7 0 0 0 0 0 0 0 1 0 0 1 0


137

Table 1.3

Mean squared error for the theta estimate in the 10-item condition

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.226 0.227 0.227 0.229 0.228 0.229 0.133 0.135 0.133 0.135 0.132 0.135

250 .5 0.227 0.227 0.228 0.230 0.227 0.231 0.133 0.135 0.134 0.136 0.133 0.134

250 .7 0.228 0.229 0.228 0.229 0.227 0.230 0.133 0.135 0.133 0.136 0.133 0.136

500 .3 0.224 0.225 0.225 0.224 0.225 0.227 0.132 0.134 0.133 0.135 0.133 0.134

500 .5 0.223 0.226 0.223 0.225 0.224 0.226 0.133 0.135 0.133 0.135 0.133 0.134

500 .7 0.225 0.224 0.225 0.227 0.223 0.226 0.133 0.135 0.133 0.134 0.132 0.135

1000 .3 0.221 0.224 0.221 0.224 0.223 0.225 0.132 0.135 0.133 0.135 0.133 0.134

1000 .5 0.221 0.222 0.222 0.224 0.222 0.222 0.133 0.134 0.134 0.134 0.133 0.134

1000 .7 0.222 0.223 0.220 0.224 0.222 0.224 0.132 0.136 0.133 0.135 0.133 0.134


138

Table 1.4

Mean squared error for the theta estimate in the 30-item condition.

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.094 0.094 0.094 0.095 0.094 0.095 0.052 0.052 0.054 0.053 0.053 0.053

250 .5 0.094 0.095 0.094 0.095 0.094 0.094 0.052 0.053 0.052 0.053 0.052 0.053

250 .7 0.094 0.095 0.094 0.095 0.094 0.095 0.052 0.053 0.052 0.053 0.052 0.053

500 .3 0.090 0.091 0.091 0.092 0.091 0.091 0.050 0.051 0.050 0.051 0.050 0.051

500 .5 0.090 0.091 0.091 0.091 0.091 0.091 0.050 0.051 0.050 0.051 0.050 0.051

500 .7 0.090 0.091 0.090 0.091 0.090 0.091 0.050 0.051 0.050 0.051 0.050 0.051

1000 .3 0.088 0.090 0.089 0.090 0.089 0.090 0.049 0.050 0.049 0.050 0.050 0.050

1000 .5 0.089 0.089 0.089 0.090 0.089 0.090 0.049 0.050 0.049 0.050 0.049 0.050

1000 .7 0.089 0.090 0.089 0.090 0.089 0.089 0.050 0.050 0.049 0.050 0.049 0.050


139

Table 1.5

Correlation between true theta and theta estimate in the 10-item condition.

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.881 0.882 0.881 0.880 0.881 0.881 0.934 0.933 0.934 0.933 0.934 0.933

250 .5 0.881 0.881 0.881 0.880 0.881 0.880 0.933 0.932 0.933 0.933 0.933 0.933

250 .7 0.881 0.881 0.881 0.881 0.881 0.880 0.933 0.933 0.933 0.933 0.933 0.932

500 .3 0.882 0.882 0.882 0.883 0.881 0.881 0.933 0.932 0.933 0.932 0.932 0.932

500 .5 0.882 0.881 0.883 0.882 0.882 0.881 0.933 0.932 0.932 0.932 0.932 0.932

500 .7 0.882 0.882 0.882 0.881 0.883 0.881 0.933 0.931 0.932 0.932 0.933 0.932

1000 .3 0.883 0.882 0.883 0.882 0.882 0.882 0.932 0.931 0.932 0.931 0.932 0.932

1000 .5 0.883 0.883 0.882 0.882 0.883 0.883 0.931 0.932 0.931 0.931 0.932 0.931

1000 .7 0.883 0.883 0.883 0.882 0.883 0.882 0.932 0.931 0.932 0.932 0.932 0.932


140

Table 1.6

Correlation between true theta and theta estimate in the 30-item condition.

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.954 0.954 0.955 0.954 0.955 0.954 0.976 0.976 0.976 0.976 0.976 0.976

250 .5 0.955 0.954 0.954 0.954 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976

250 .7 0.954 0.954 0.955 0.955 0.955 0.954 0.976 0.976 0.976 0.976 0.976 0.976

500 .3 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976

500 .5 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976

500 .7 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976

1000 .3 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976

1000 .5 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976

1000 .7 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976


141

Table 1.7

Mean square error for the slopes in the 2PL model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.157 0.180 0.164 0.170 0.159 0.202 0.105 0.114 0.106 0.116 0.104 0.111

250 .5 0.161 0.170 0.155 0.181 0.161 0.176 0.103 0.112 0.106 0.113 0.102 0.116

250 .7 0.165 0.166 0.174 0.196 0.160 0.199 0.105 0.114 0.105 0.116 0.104 0.116

500 .3 0.076 0.091 0.073 0.098 0.070 0.099 0.049 0.061 0.049 0.059 0.049 0.057

500 .5 0.070 0.107 0.068 0.097 0.077 0.091 0.049 0.060 0.048 0.062 0.049 0.059

500 .7 0.069 0.097 0.077 0.102 0.077 0.107 0.049 0.061 0.049 0.058 0.049 0.062

1000 .3 0.036 0.059 0.034 0.055 0.035 0.065 0.024 0.036 0.024 0.037 0.024 0.034

1000 .5 0.034 0.059 0.033 0.072 0.033 0.053 0.024 0.035 0.023 0.036 0.024 0.035

1000 .7 0.034 0.053 0.032 0.059 0.033 0.058 0.024 0.035 0.024 0.035 0.024 0.034


142

Table 1.8

Variance explained for the slopes in the 2PL Model.

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.439 0.395 0.455 0.440 0.447 0.411 0.523 0.470 0.510 0.475 0.511 0.475

250 .5 0.436 0.439 0.443 0.407 0.454 0.423 0.511 0.485 0.516 0.469 0.516 0.469

250 .7 0.462 0.434 0.449 0.401 0.462 0.427 0.516 0.473 0.521 0.469 0.520 0.477

500 .3 0.610 0.544 0.608 0.526 0.606 0.536 0.680 0.611 0.678 0.612 0.671 0.627

500 .5 0.612 0.535 0.606 0.536 0.601 0.527 0.673 0.607 0.680 0.604 0.684 0.615

500 .7 0.596 0.525 0.602 0.546 0.584 0.534 0.680 0.605 0.670 0.618 0.676 0.620

1000 .3 0.731 0.673 0.735 0.652 0.740 0.650 0.804 0.715 0.808 0.719 0.804 0.726

1000 .5 0.737 0.665 0.745 0.639 0.746 0.647 0.800 0.718 0.808 0.720 0.808 0.724

1000 .7 0.746 0.666 0.751 0.656 0.744 0.660 0.799 0.718 0.808 0.724 0.805 0.727


143

Table 1.9

Mean square error for the threshold parameter in the 2PL Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.045 0.161 0.043 0.047 0.041 0.057 0.034 0.040 0.031 0.047 0.032 0.043

250 .5 0.039 0.046 0.040 0.082 0.039 0.053 0.035 0.044 0.033 0.050 0.032 0.046

250 .7 0.038 0.048 0.040 0.051 0.050 0.052 0.035 0.052 0.031 0.043 0.034 0.044

500 .3 0.020 0.031 0.018 0.036 0.020 0.034 0.016 0.029 0.017 0.033 0.019 0.029

500 .5 0.021 0.048 0.019 0.033 0.020 0.030 0.017 0.030 0.018 0.029 0.016 0.029

500 .7 0.020 0.035 0.019 0.034 0.017 0.037 0.017 0.028 0.016 0.027 0.016 0.030

1000 .3 0.009 0.021 0.011 0.024 0.010 0.025 0.008 0.022 0.008 0.022 0.008 0.021

1000 .5 0.010 0.021 0.010 0.025 0.010 0.023 0.008 0.021 0.008 0.020 0.008 0.022

1000 .7 0.009 0.022 0.008 0.023 0.010 0.024 0.008 0.020 0.008 0.021 0.009 0.020


144

Table 1.10

Variance explained for the threshold parameter in the 2PL Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.976 0.966 0.976 0.965 0.976 0.962 0.976 0.967 0.976 0.962 0.976 0.964

250 .5 0.977 0.968 0.976 0.958 0.976 0.961 0.975 0.966 0.976 0.962 0.977 0.961

250 .7 0.976 0.964 0.975 0.962 0.976 0.963 0.975 0.965 0.976 0.964 0.976 0.964

500 .3 0.987 0.975 0.989 0.973 0.988 0.974 0.988 0.976 0.988 0.974 0.987 0.976

500 .5 0.987 0.975 0.989 0.976 0.988 0.975 0.988 0.975 0.988 0.975 0.988 0.975

500 .7 0.988 0.974 0.988 0.974 0.988 0.976 0.988 0.976 0.988 0.977 0.989 0.975

1000 .3 0.994 0.982 0.994 0.982 0.994 0.981 0.994 0.981 0.994 0.981 0.994 0.981

1000 .5 0.994 0.982 0.994 0.980 0.994 0.980 0.994 0.982 0.994 0.982 0.994 0.981

1000 .7 0.994 0.982 0.994 0.981 0.994 0.980 0.994 0.982 0.994 0.981 0.994 0.982


145

Table 1.11

Mean square error for the slopes in the Graded Response Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.053 0.064 0.052 0.067 0.053 0.067 0.041 0.050 0.041 0.053 0.042 0.052

250 .5 0.051 0.067 0.053 0.067 0.052 0.066 0.041 0.050 0.042 0.052 0.041 0.051

250 .7 0.054 0.065 0.054 0.066 0.052 0.067 0.043 0.051 0.042 0.053 0.042 0.052

500 .3 0.025 0.039 0.026 0.040 0.027 0.042 0.021 0.032 0.021 0.032 0.021 0.031

500 .5 0.027 0.041 0.026 0.040 0.026 0.042 0.021 0.033 0.021 0.032 0.021 0.033

500 .7 0.027 0.042 0.026 0.040 0.027 0.039 0.021 0.032 0.021 0.031 0.021 0.033

1000 .3 0.013 0.027 0.013 0.024 0.013 0.024 0.011 0.021 0.011 0.023 0.011 0.022

1000 .5 0.013 0.032 0.013 0.027 0.013 0.026 0.011 0.023 0.011 0.022 0.011 0.020

1000 .7 0.013 0.037 0.013 0.027 0.013 0.029 0.011 0.021 0.011 0.022 0.011 0.021


146

Table 1.12

Variance explained for the slopes in the Graded Response Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.657 0.600 0.680 0.579 0.660 0.586 0.724 0.661 0.726 0.645 0.721 0.653

250 .5 0.671 0.608 0.664 0.601 0.676 0.585 0.726 0.663 0.723 0.649 0.723 0.647

250 .7 0.663 0.606 0.662 0.600 0.663 0.587 0.719 0.665 0.726 0.642 0.724 0.650

500 .3 0.782 0.714 0.790 0.706 0.785 0.714 0.831 0.743 0.836 0.747 0.831 0.749

500 .5 0.782 0.702 0.790 0.695 0.791 0.689 0.830 0.741 0.831 0.746 0.833 0.744

500 .7 0.784 0.687 0.781 0.718 0.782 0.702 0.834 0.747 0.836 0.752 0.833 0.741

1000 .3 0.880 0.793 0.875 0.794 0.872 0.806 0.904 0.818 0.904 0.802 0.907 0.810

1000 .5 0.876 0.780 0.873 0.785 0.877 0.781 0.905 0.809 0.904 0.810 0.905 0.821

1000 .7 0.875 0.792 0.873 0.783 0.880 0.776 0.899 0.815 0.903 0.805 0.904 0.809


147

Table 1.13

Mean square error for the first threshold in the Graded Response Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.026 0.039 0.028 0.041 0.026 0.042 0.025 0.037 0.027 0.040 0.027 0.038

250 .5 0.027 0.039 0.028 0.043 0.028 0.042 0.024 0.036 0.026 0.040 0.026 0.040

250 .7 0.027 0.052 0.028 0.042 0.029 0.042 0.026 0.036 0.026 0.041 0.025 0.040

500 .3 0.013 0.029 0.014 0.029 0.013 0.029 0.013 0.026 0.013 0.027 0.013 0.025

500 .5 0.014 0.028 0.014 0.027 0.014 0.031 0.013 0.027 0.013 0.026 0.013 0.026

500 .7 0.014 0.028 0.015 0.026 0.014 0.027 0.013 0.027 0.014 0.026 0.013 0.026

1000 .3 0.007 0.021 0.007 0.021 0.007 0.020 0.007 0.018 0.007 0.020 0.007 0.020

1000 .5 0.008 0.020 0.007 0.019 0.007 0.020 0.007 0.019 0.007 0.021 0.007 0.018

1000 .7 0.008 0.021 0.007 0.021 0.007 0.021 0.007 0.019 0.007 0.020 0.007 0.020


148

Table 1.14

Mean square error for the second threshold in the Graded Response Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.019 0.032 0.019 0.034 0.018 0.034 0.019 0.030 0.019 0.032 0.019 0.031

250 .5 0.020 0.031 0.020 0.034 0.019 0.032 0.017 0.029 0.019 0.032 0.019 0.033

250 .7 0.018 0.037 0.020 0.033 0.020 0.034 0.018 0.029 0.018 0.033 0.018 0.032

500 .3 0.011 0.025 0.010 0.025 0.011 0.023 0.010 0.023 0.010 0.023 0.010 0.022

500 .5 0.011 0.025 0.010 0.023 0.011 0.027 0.010 0.023 0.010 0.023 0.010 0.023

500 .7 0.011 0.024 0.011 0.022 0.011 0.024 0.010 0.023 0.010 0.022 0.010 0.023

1000 .3 0.005 0.020 0.006 0.019 0.006 0.018 0.005 0.016 0.005 0.018 0.005 0.018

1000 .5 0.006 0.018 0.006 0.018 0.006 0.018 0.006 0.018 0.005 0.019 0.006 0.017

1000 .7 0.006 0.020 0.006 0.019 0.006 0.019 0.005 0.017 0.006 0.018 0.006 0.018


149

Table 1.15

Mean square error for the third threshold in the Graded Response Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.023 0.038 0.023 0.040 0.023 0.041 0.023 0.034 0.022 0.036 0.023 0.036

250 .5 0.023 0.037 0.023 0.039 0.023 0.038 0.021 0.034 0.023 0.037 0.023 0.038

250 .7 0.022 0.037 0.024 0.039 0.024 0.039 0.021 0.033 0.021 0.037 0.022 0.036

500 .3 0.014 0.030 0.013 0.029 0.014 0.028 0.013 0.027 0.013 0.026 0.013 0.026

500 .5 0.014 0.030 0.013 0.026 0.013 0.031 0.013 0.027 0.013 0.026 0.013 0.026

500 .7 0.015 0.028 0.014 0.027 0.014 0.028 0.013 0.027 0.013 0.025 0.013 0.026

1000 .3 0.007 0.022 0.008 0.022 0.008 0.021 0.007 0.019 0.007 0.020 0.007 0.020

1000 .5 0.008 0.022 0.008 0.020 0.008 0.021 0.007 0.020 0.007 0.021 0.007 0.019

1000 .7 0.008 0.024 0.008 0.022 0.007 0.022 0.007 0.020 0.007 0.021 0.007 0.020


150

Table 1.16

Mean square error for the fourth threshold in the Graded Response Model.

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.041 0.058 0.042 0.061 0.040 0.062 0.038 0.052 0.039 0.054 0.039 0.055

250 .5 0.040 0.057 0.041 0.059 0.039 0.054 0.036 0.052 0.040 0.055 0.040 0.056

250 .7 0.039 0.057 0.042 0.061 0.040 0.062 0.036 0.050 0.037 0.054 0.038 0.053

500 .3 0.024 0.045 0.025 0.042 0.024 0.041 0.023 0.038 0.023 0.038 0.023 0.037

500 .5 0.025 0.044 0.025 0.038 0.024 0.044 0.023 0.038 0.023 0.037 0.023 0.038

500 .7 0.026 0.041 0.026 0.040 0.024 0.040 0.023 0.040 0.023 0.037 0.023 0.038

1000 .3 0.013 0.030 0.014 0.031 0.014 0.029 0.013 0.026 0.013 0.028 0.013 0.028

1000 .5 0.015 0.031 0.014 0.029 0.014 0.028 0.013 0.027 0.013 0.029 0.013 0.026

1000 .7 0.014 0.035 0.014 0.030 0.013 0.030 0.013 0.027 0.013 0.028 0.013 0.027


151

Table 1.17

Correlation between true and estimated first threshold in the Graded Response Model.

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.966 0.951 0.967 0.945 0.966 0.946 0.968 0.950 0.968 0.946 0.968 0.948

250 .5 0.967 0.948 0.967 0.947 0.964 0.941 0.969 0.951 0.968 0.945 0.968 0.945

250 .7 0.967 0.948 0.966 0.945 0.965 0.942 0.968 0.952 0.967 0.947 0.968 0.945

500 .3 0.988 0.969 0.987 0.969 0.988 0.970 0.988 0.971 0.987 0.971 0.988 0.972

500 .5 0.987 0.971 0.987 0.971 0.987 0.968 0.988 0.971 0.988 0.971 0.988 0.971

500 .7 0.987 0.970 0.986 0.972 0.987 0.972 0.987 0.971 0.987 0.971 0.988 0.971

1000 .3 0.994 0.979 0.994 0.978 0.994 0.980 0.994 0.981 0.994 0.981 0.994 0.981

1000 .5 0.994 0.980 0.994 0.981 0.994 0.980 0.994 0.980 0.994 0.980 0.994 0.982

1000 .7 0.994 0.979 0.994 0.980 0.994 0.979 0.994 0.981 0.994 0.980 0.995 0.980


152

Table 1.18

Correlation between true and estimated second thresholds in the Graded Response Model.

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.977 0.960 0.977 0.955 0.977 0.957 0.977 0.960 0.977 0.957 0.977 0.957

250 .5 0.977 0.958 0.978 0.957 0.976 0.954 0.978 0.960 0.977 0.956 0.977 0.955

250 .7 0.978 0.960 0.977 0.957 0.976 0.953 0.978 0.962 0.978 0.958 0.977 0.956

500 .3 0.990 0.973 0.990 0.974 0.991 0.975 0.991 0.975 0.991 0.975 0.990 0.976

500 .5 0.990 0.974 0.991 0.975 0.990 0.972 0.991 0.975 0.991 0.975 0.991 0.975

500 .7 0.990 0.975 0.990 0.976 0.990 0.976 0.991 0.975 0.990 0.976 0.991 0.975

1000 .3 0.996 0.981 0.995 0.981 0.996 0.982 0.996 0.983 0.996 0.983 0.996 0.983

1000 .5 0.995 0.981 0.996 0.983 0.996 0.982 0.996 0.982 0.996 0.982 0.996 0.984

1000 .7 0.995 0.981 0.996 0.982 0.996 0.981 0.996 0.983 0.996 0.982 0.996 0.982


153

Table 1.19

Correlation between true and estimated third thresholds in the Graded Response Model

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.970 0.953 0.971 0.948 0.971 0.947 0.971 0.954 0.972 0.951 0.971 0.952

250 .5 0.973 0.950 0.971 0.950 0.970 0.947 0.973 0.954 0.971 0.950 0.972 0.948

250 .7 0.971 0.953 0.972 0.949 0.971 0.946 0.973 0.956 0.974 0.952 0.972 0.951

500 .3 0.986 0.968 0.986 0.970 0.987 0.970 0.988 0.972 0.988 0.972 0.987 0.973

500 .5 0.987 0.969 0.987 0.972 0.987 0.968 0.988 0.971 0.988 0.972 0.988 0.971

500 .7 0.986 0.970 0.986 0.971 0.987 0.973 0.988 0.972 0.988 0.973 0.988 0.972

1000 .3 0.994 0.979 0.993 0.979 0.994 0.978 0.994 0.981 0.994 0.980 0.994 0.981

1000 .5 0.994 0.978 0.994 0.981 0.994 0.980 0.994 0.980 0.994 0.980 0.994 0.981

1000 .7 0.994 0.978 0.994 0.979 0.994 0.979 0.994 0.981 0.994 0.980 0.994 0.980


154

Table 1.20. Correlation between true and estimated fourth thresholds in the Graded Response Model.

Number of Items

10 30

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.945 0.927 0.948 0.920 0.949 0.919 0.951 0.931 0.949 0.928 0.949 0.927

250 .5 0.950 0.924 0.947 0.922 0.948 0.923 0.953 0.931 0.949 0.926 0.950 0.925

250 .7 0.949 0.925 0.950 0.923 0.947 0.916 0.952 0.933 0.953 0.930 0.950 0.927

500 .3 0.975 0.953 0.975 0.957 0.978 0.957 0.978 0.959 0.978 0.959 0.977 0.961

500 .5 0.976 0.953 0.975 0.957 0.976 0.954 0.978 0.960 0.977 0.960 0.978 0.959

500 .7 0.973 0.953 0.974 0.957 0.976 0.961 0.977 0.959 0.978 0.961 0.978 0.960

1000 .3 0.989 0.971 0.987 0.971 0.989 0.971 0.989 0.975 0.989 0.973 0.989 0.973

1000 .5 0.988 0.969 0.988 0.973 0.989 0.972 0.989 0.974 0.989 0.973 0.988 0.975

1000 .7 0.988 0.969 0.988 0.971 0.989 0.972 0.989 0.974 0.989 0.973 0.989 0.974


155

Table 2.1

Classification rate of data-generating theta in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.621 0.626 0.616 0.607 0.596 0.599 0.634 0.636 0.614 0.608 0.608 0.599

250 .5 0.711 0.712 0.679 0.689 0.675 0.669 0.706 0.710 0.681 0.684 0.669 0.666

250 .7 0.798 0.797 0.773 0.771 0.754 0.751 0.799 0.796 0.771 0.772 0.748 0.749

500 .3 0.621 0.630 0.605 0.614 0.598 0.595 0.628 0.623 0.610 0.610 0.604 0.596

500 .5 0.705 0.706 0.682 0.687 0.667 0.673 0.709 0.710 0.686 0.689 0.671 0.666

500 .7 0.798 0.799 0.766 0.769 0.752 0.751 0.797 0.794 0.767 0.766 0.752 0.750

1000 .3 0.631 0.619 0.606 0.599 0.600 0.604 0.616 0.621 0.607 0.608 0.600 0.596

1000 .5 0.705 0.701 0.686 0.689 0.672 0.670 0.694 0.698 0.686 0.682 0.667 0.670

1000 .7 0.794 0.791 0.769 0.768 0.751 0.749 0.789 0.792 0.765 0.769 0.752 0.750


156

Table 2.2

Classification rate of data-generating theta in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.641 0.626 0.609 0.612 0.604 0.599 0.629 0.629 0.626 0.613 0.599 0.601

250 .5 0.705 0.705 0.688 0.693 0.668 0.667 0.717 0.716 0.681 0.693 0.667 0.665

250 .7 0.796 0.797 0.767 0.769 0.746 0.749 0.794 0.801 0.774 0.772 0.748 0.752

500 .3 0.627 0.624 0.616 0.618 0.598 0.596 0.623 0.624 0.613 0.616 0.598 0.597

500 .5 0.702 0.708 0.681 0.683 0.671 0.671 0.701 0.707 0.685 0.687 0.673 0.666

500 .7 0.790 0.790 0.768 0.772 0.749 0.749 0.796 0.794 0.768 0.771 0.750 0.752

1000 .3 0.630 0.618 0.607 0.609 0.603 0.608 0.625 0.626 0.613 0.608 0.598 0.595

1000 .5 0.701 0.701 0.681 0.686 0.672 0.673 0.709 0.697 0.678 0.683 0.669 0.670

1000 .7 0.788 0.786 0.765 0.766 0.751 0.749 0.790 0.792 0.770 0.768 0.752 0.752


157

Table 2.3

Sensitivity of data generating theta in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.610 0.605 0.598 0.607 0.600 0.595 0.602 0.598 0.600 0.606 0.584 0.595

250 .5 0.696 0.694 0.703 0.687 0.669 0.681 0.699 0.700 0.698 0.698 0.683 0.687

250 .7 0.797 0.801 0.784 0.788 0.765 0.773 0.802 0.800 0.789 0.784 0.773 0.773

500 .3 0.621 0.608 0.619 0.607 0.608 0.610 0.610 0.613 0.611 0.612 0.594 0.607

500 .5 0.717 0.714 0.706 0.698 0.690 0.682 0.710 0.709 0.699 0.697 0.686 0.691

500 .7 0.811 0.806 0.806 0.802 0.776 0.776 0.807 0.815 0.801 0.802 0.776 0.780

1000 .3 0.613 0.629 0.620 0.631 0.608 0.599 0.632 0.626 0.620 0.618 0.607 0.611

1000 .5 0.718 0.723 0.707 0.705 0.687 0.691 0.733 0.728 0.707 0.711 0.694 0.692

1000 .7 0.819 0.824 0.804 0.803 0.782 0.783 0.826 0.825 0.809 0.805 0.779 0.784


158

Table 2.4

Sensitivity of data-generating theta in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.590 0.604 0.605 0.602 0.592 0.599 0.603 0.603 0.605 0.600 0.599 0.594

250 .5 0.708 0.701 0.691 0.683 0.682 0.684 0.690 0.690 0.700 0.684 0.686 0.689

250 .7 0.798 0.796 0.792 0.788 0.775 0.772 0.801 0.792 0.779 0.784 0.776 0.767

500 .3 0.610 0.616 0.603 0.603 0.603 0.609 0.618 0.618 0.608 0.604 0.604 0.605

500 .5 0.719 0.710 0.708 0.705 0.685 0.684 0.713 0.711 0.699 0.702 0.682 0.694

500 .7 0.820 0.813 0.801 0.795 0.780 0.779 0.813 0.814 0.798 0.795 0.779 0.774

1000 .3 0.615 0.630 0.619 0.617 0.602 0.593 0.620 0.621 0.613 0.617 0.609 0.614

1000 .5 0.724 0.727 0.712 0.707 0.689 0.687 0.711 0.730 0.718 0.710 0.690 0.691

1000 .7 0.827 0.834 0.809 0.809 0.781 0.785 0.826 0.821 0.801 0.804 0.780 0.781


159

Table 2.5

Specificity of data-generating theta in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.622 0.627 0.618 0.607 0.595 0.600 0.636 0.638 0.615 0.608 0.614 0.600

250 .5 0.711 0.713 0.676 0.690 0.677 0.666 0.707 0.711 0.679 0.682 0.665 0.661

250 .7 0.798 0.797 0.772 0.769 0.752 0.745 0.798 0.795 0.769 0.770 0.741 0.743

500 .3 0.621 0.631 0.604 0.615 0.595 0.591 0.629 0.623 0.609 0.610 0.607 0.594

500 .5 0.704 0.705 0.680 0.685 0.661 0.671 0.709 0.710 0.685 0.688 0.667 0.660

500 .7 0.798 0.798 0.761 0.765 0.746 0.744 0.797 0.793 0.763 0.762 0.746 0.743

1000 .3 0.632 0.618 0.604 0.595 0.597 0.605 0.615 0.620 0.606 0.607 0.598 0.593

1000 .5 0.704 0.700 0.684 0.687 0.668 0.665 0.692 0.696 0.684 0.679 0.661 0.664

1000 .7 0.792 0.789 0.765 0.764 0.744 0.741 0.787 0.790 0.760 0.765 0.745 0.742


160

Table 2.6

Specificity of data-generating theta in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.643 0.627 0.609 0.612 0.607 0.599 0.630 0.630 0.628 0.615 0.599 0.603

250 .5 0.705 0.705 0.688 0.694 0.664 0.663 0.719 0.717 0.679 0.694 0.662 0.659

250 .7 0.796 0.797 0.765 0.767 0.739 0.744 0.794 0.802 0.773 0.771 0.741 0.749

500 .3 0.628 0.624 0.617 0.619 0.597 0.592 0.623 0.624 0.613 0.617 0.596 0.595

500 .5 0.701 0.708 0.678 0.681 0.668 0.668 0.701 0.707 0.684 0.686 0.671 0.658

500 .7 0.788 0.789 0.765 0.769 0.742 0.742 0.795 0.793 0.765 0.768 0.743 0.746

1000 .3 0.631 0.617 0.606 0.608 0.603 0.611 0.625 0.627 0.613 0.607 0.596 0.590

1000 .5 0.700 0.700 0.678 0.684 0.668 0.669 0.709 0.695 0.674 0.680 0.664 0.664

1000 .7 0.785 0.783 0.760 0.761 0.743 0.739 0.788 0.791 0.766 0.764 0.745 0.745


161

Table 2.7

Classification rate of estimated thetas in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.610 0.607 0.608 0.590 0.577 0.582 0.617 0.620 0.613 0.609 0.597 0.590

250 .5 0.679 0.683 0.656 0.660 0.650 0.652 0.701 0.697 0.679 0.668 0.662 0.657

250 .7 0.756 0.750 0.732 0.726 0.718 0.715 0.779 0.782 0.753 0.748 0.731 0.732

500 .3 0.610 0.610 0.595 0.598 0.588 0.585 0.615 0.615 0.607 0.604 0.597 0.591

500 .5 0.683 0.677 0.657 0.661 0.648 0.651 0.699 0.703 0.669 0.673 0.658 0.659

500 .7 0.748 0.743 0.730 0.731 0.720 0.718 0.778 0.774 0.748 0.750 0.736 0.736

1000 .3 0.606 0.598 0.588 0.587 0.589 0.588 0.610 0.615 0.601 0.603 0.597 0.594

1000 .5 0.671 0.676 0.654 0.659 0.652 0.646 0.692 0.691 0.668 0.673 0.660 0.660

1000 .7 0.743 0.744 0.731 0.732 0.717 0.716 0.770 0.769 0.750 0.747 0.733 0.733


162

Table 2.8

Classification rate of estimated thetas in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.638 0.635 0.604 0.608 0.606 0.596 0.624 0.624 0.633 0.607 0.597 0.596

250 .5 0.697 0.703 0.669 0.679 0.664 0.666 0.710 0.708 0.679 0.686 0.665 0.661

250 .7 0.778 0.784 0.754 0.754 0.735 0.739 0.789 0.796 0.767 0.765 0.744 0.743

500 .3 0.621 0.604 0.611 0.609 0.597 0.590 0.618 0.617 0.601 0.610 0.595 0.594

500 .5 0.690 0.701 0.680 0.671 0.661 0.663 0.703 0.703 0.678 0.679 0.670 0.660

500 .7 0.773 0.778 0.755 0.753 0.739 0.737 0.789 0.788 0.761 0.761 0.746 0.744

1000 .3 0.617 0.609 0.603 0.601 0.597 0.599 0.617 0.615 0.605 0.607 0.598 0.596

1000 .5 0.688 0.687 0.672 0.680 0.662 0.668 0.703 0.700 0.675 0.679 0.667 0.664

1000 .7 0.773 0.776 0.750 0.752 0.743 0.739 0.784 0.785 0.762 0.761 0.743 0.746


163

Table 2.9

Sensitivity of estimated thetas in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.595 0.595 0.583 0.602 0.607 0.596 0.602 0.599 0.588 0.588 0.582 0.594

250 .5 0.685 0.678 0.685 0.679 0.660 0.657 0.676 0.688 0.674 0.691 0.666 0.672

250 .7 0.769 0.780 0.759 0.771 0.744 0.749 0.778 0.774 0.769 0.770 0.756 0.754

500 .3 0.607 0.604 0.605 0.601 0.597 0.597 0.609 0.609 0.601 0.604 0.588 0.597

500 .5 0.695 0.697 0.691 0.686 0.672 0.668 0.692 0.686 0.696 0.689 0.677 0.674

500 .7 0.790 0.792 0.776 0.773 0.750 0.750 0.790 0.793 0.782 0.779 0.755 0.756

1000 .3 0.613 0.624 0.617 0.619 0.598 0.598 0.622 0.615 0.613 0.608 0.595 0.598

1000 .5 0.710 0.704 0.701 0.695 0.670 0.682 0.706 0.709 0.701 0.696 0.678 0.680

1000 .7 0.804 0.802 0.780 0.776 0.759 0.758 0.807 0.809 0.785 0.789 0.763 0.763


164

Table 2.10

Sensitivity of estimated thetas in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.581 0.585 0.602 0.595 0.576 0.593 0.604 0.604 0.589 0.604 0.595 0.597

250 .5 0.695 0.688 0.698 0.682 0.670 0.666 0.688 0.689 0.693 0.681 0.676 0.685

250 .7 0.790 0.781 0.783 0.780 0.765 0.761 0.793 0.782 0.774 0.778 0.767 0.763

500 .3 0.608 0.628 0.602 0.604 0.595 0.610 0.617 0.620 0.616 0.606 0.601 0.607

500 .5 0.717 0.701 0.692 0.702 0.685 0.679 0.701 0.709 0.699 0.703 0.678 0.692

500 .7 0.808 0.800 0.789 0.789 0.767 0.769 0.806 0.806 0.795 0.792 0.768 0.772

1000 .3 0.619 0.628 0.616 0.617 0.600 0.599 0.622 0.624 0.616 0.614 0.602 0.607

1000 .5 0.723 0.724 0.709 0.697 0.687 0.675 0.709 0.719 0.713 0.707 0.684 0.689

1000 .7 0.815 0.815 0.802 0.797 0.765 0.771 0.820 0.814 0.797 0.799 0.779 0.774


165

Table 2.11

Specificity of estimated thetas in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.611 0.608 0.611 0.589 0.569 0.578 0.618 0.621 0.616 0.611 0.601 0.589

250 .5 0.678 0.684 0.653 0.658 0.648 0.651 0.703 0.697 0.680 0.665 0.661 0.653

250 .7 0.755 0.748 0.729 0.721 0.711 0.706 0.779 0.782 0.751 0.746 0.725 0.727

500 .3 0.610 0.610 0.594 0.597 0.586 0.582 0.615 0.616 0.607 0.604 0.599 0.590

500 .5 0.682 0.676 0.654 0.658 0.643 0.646 0.699 0.704 0.666 0.671 0.653 0.655

500 .7 0.746 0.740 0.725 0.726 0.712 0.709 0.777 0.773 0.744 0.747 0.731 0.731

1000 .3 0.606 0.597 0.585 0.583 0.587 0.585 0.609 0.615 0.599 0.602 0.597 0.592

1000 .5 0.669 0.675 0.648 0.655 0.647 0.637 0.692 0.690 0.664 0.670 0.655 0.655

1000 .7 0.740 0.741 0.726 0.727 0.706 0.705 0.768 0.767 0.746 0.743 0.726 0.725


166

Table 2.12

Specificity of estimated thetas in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.641 0.638 0.604 0.610 0.613 0.597 0.625 0.625 0.635 0.607 0.597 0.596

250 .5 0.697 0.704 0.666 0.679 0.662 0.666 0.711 0.709 0.678 0.687 0.663 0.655

250 .7 0.778 0.784 0.750 0.751 0.728 0.733 0.789 0.797 0.767 0.764 0.738 0.738

500 .3 0.622 0.603 0.612 0.609 0.597 0.584 0.618 0.617 0.600 0.610 0.593 0.591

500 .5 0.688 0.701 0.679 0.667 0.655 0.659 0.703 0.702 0.676 0.677 0.668 0.652

500 .7 0.771 0.776 0.752 0.749 0.732 0.729 0.788 0.787 0.757 0.757 0.741 0.737

1000 .3 0.617 0.608 0.601 0.599 0.596 0.599 0.617 0.614 0.604 0.606 0.597 0.593

1000 .5 0.686 0.685 0.668 0.678 0.656 0.667 0.702 0.699 0.671 0.676 0.662 0.658

1000 .7 0.770 0.774 0.745 0.747 0.737 0.732 0.782 0.784 0.759 0.756 0.735 0.739


167

Table 2.13

Classification rate of raw summed score in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.594 0.603 0.590 0.574 0.576 0.585 0.618 0.611 0.605 0.597 0.592 0.589

250 .5 0.669 0.669 0.648 0.656 0.647 0.650 0.691 0.692 0.672 0.662 0.656 0.653

250 .7 0.739 0.738 0.727 0.719 0.715 0.710 0.769 0.771 0.746 0.745 0.728 0.729

500 .3 0.599 0.598 0.591 0.595 0.586 0.584 0.615 0.614 0.607 0.599 0.592 0.593

500 .5 0.668 0.665 0.658 0.656 0.647 0.650 0.686 0.693 0.669 0.667 0.659 0.654

500 .7 0.735 0.733 0.724 0.724 0.715 0.712 0.774 0.765 0.742 0.746 0.730 0.728

1000 .3 0.602 0.590 0.584 0.587 0.586 0.584 0.607 0.608 0.602 0.600 0.594 0.590

1000 .5 0.665 0.667 0.651 0.651 0.648 0.648 0.683 0.683 0.665 0.672 0.657 0.654

1000 .7 0.739 0.737 0.725 0.727 0.716 0.714 0.762 0.764 0.745 0.745 0.729 0.730


168

Table 2.14

Classification rate of raw summed score in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.627 0.624 0.600 0.594 0.603 0.591 0.628 0.628 0.627 0.604 0.599 0.591

250 .5 0.686 0.687 0.664 0.669 0.658 0.663 0.706 0.705 0.678 0.685 0.664 0.659

250 .7 0.766 0.772 0.745 0.748 0.734 0.735 0.788 0.790 0.764 0.764 0.742 0.741

500 .3 0.609 0.597 0.606 0.605 0.591 0.590 0.618 0.623 0.603 0.608 0.592 0.593

500 .5 0.680 0.693 0.676 0.667 0.660 0.661 0.703 0.700 0.678 0.679 0.668 0.660

500 .7 0.766 0.769 0.749 0.750 0.737 0.733 0.784 0.782 0.759 0.763 0.744 0.744

1000 .3 0.613 0.606 0.594 0.597 0.594 0.598 0.621 0.613 0.609 0.605 0.595 0.593

1000 .5 0.682 0.680 0.670 0.678 0.660 0.663 0.700 0.694 0.674 0.678 0.664 0.663

1000 .7 0.764 0.767 0.745 0.751 0.740 0.739 0.779 0.783 0.759 0.754 0.742 0.746


169

Table 2.15

Sensitivity of raw summed score in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.611 0.600 0.604 0.619 0.610 0.591 0.602 0.607 0.594 0.600 0.589 0.591

250 .5 0.697 0.698 0.695 0.684 0.666 0.664 0.691 0.689 0.680 0.695 0.674 0.677

250 .7 0.793 0.798 0.767 0.785 0.754 0.760 0.787 0.783 0.774 0.771 0.757 0.755

500 .3 0.618 0.616 0.609 0.604 0.598 0.599 0.606 0.609 0.598 0.608 0.596 0.594

500 .5 0.711 0.712 0.690 0.693 0.674 0.669 0.707 0.695 0.693 0.694 0.673 0.680

500 .7 0.804 0.808 0.785 0.782 0.760 0.760 0.794 0.801 0.786 0.780 0.761 0.764

1000 .3 0.620 0.634 0.621 0.618 0.602 0.604 0.624 0.621 0.610 0.610 0.598 0.603

1000 .5 0.718 0.715 0.703 0.705 0.677 0.679 0.712 0.714 0.701 0.693 0.679 0.687

1000 .7 0.810 0.811 0.788 0.785 0.761 0.761 0.813 0.810 0.788 0.787 0.763 0.765


170

Table 2.16

Sensitivity of raw summed score in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.598 0.598 0.607 0.616 0.581 0.603 0.598 0.598 0.598 0.607 0.591 0.604

250 .5 0.708 0.706 0.703 0.697 0.681 0.673 0.690 0.690 0.694 0.682 0.678 0.690

250 .7 0.809 0.799 0.794 0.788 0.769 0.768 0.792 0.790 0.777 0.778 0.769 0.766

500 .3 0.621 0.633 0.607 0.609 0.605 0.609 0.616 0.614 0.613 0.608 0.605 0.606

500 .5 0.728 0.711 0.698 0.708 0.685 0.683 0.701 0.710 0.698 0.700 0.681 0.691

500 .7 0.816 0.811 0.798 0.795 0.770 0.776 0.810 0.812 0.796 0.787 0.770 0.769

1000 .3 0.623 0.631 0.625 0.620 0.605 0.600 0.615 0.625 0.611 0.617 0.606 0.610

1000 .5 0.728 0.732 0.710 0.700 0.690 0.683 0.712 0.725 0.712 0.706 0.686 0.689

1000 .7 0.825 0.825 0.808 0.799 0.769 0.771 0.824 0.817 0.801 0.805 0.779 0.771


171

Table 2.17

Specificity of raw summed score in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.593 0.604 0.588 0.569 0.568 0.583 0.619 0.612 0.606 0.597 0.593 0.588

250 .5 0.667 0.667 0.643 0.653 0.642 0.647 0.691 0.693 0.671 0.659 0.652 0.646

250 .7 0.736 0.734 0.722 0.711 0.705 0.697 0.768 0.770 0.743 0.742 0.721 0.722

500 .3 0.598 0.597 0.590 0.595 0.583 0.581 0.616 0.614 0.608 0.598 0.590 0.592

500 .5 0.666 0.662 0.655 0.652 0.641 0.646 0.685 0.693 0.667 0.664 0.655 0.647

500 .7 0.732 0.729 0.717 0.718 0.704 0.700 0.773 0.763 0.737 0.743 0.723 0.719

1000 .3 0.601 0.588 0.580 0.584 0.582 0.579 0.606 0.607 0.601 0.598 0.593 0.586

1000 .5 0.663 0.665 0.646 0.646 0.640 0.640 0.682 0.681 0.662 0.669 0.652 0.646

1000 .7 0.735 0.734 0.718 0.720 0.704 0.702 0.760 0.761 0.740 0.740 0.721 0.721


172

Table 2.18

Specificity of raw summed score in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.629 0.626 0.599 0.592 0.609 0.587 0.630 0.630 0.628 0.604 0.601 0.589

250 .5 0.685 0.686 0.660 0.666 0.652 0.660 0.707 0.706 0.677 0.685 0.661 0.651

250 .7 0.764 0.771 0.749 0.744 0.726 0.727 0.788 0.790 0.763 0.763 0.736 0.734

500 .3 0.608 0.595 0.606 0.604 0.587 0.585 0.618 0.623 0.601 0.608 0.589 0.590

500 .5 0.677 0.692 0.673 0.663 0.654 0.656 0.703 0.699 0.676 0.677 0.664 0.653

500 .7 0.764 0.767 0.743 0.746 0.728 0.722 0.783 0.780 0.755 0.760 0.738 0.737

1000 .3 0.612 0.605 0.591 0.595 0.591 0.598 0.622 0.613 0.609 0.603 0.592 0.589

1000 .5 0.680 0.677 0.666 0.675 0.652 0.658 0.699 0.692 0.670 0.674 0.659 0.657

1000 .7 0.761 0.764 0.738 0.745 0.732 0.732 0.776 0.781 0.754 0.748 0.733 0.740


173

Table 3.1

Proportion of the CART models with a Bayes classifier that did not assign cases to the minority class in

conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.994 0.994 0.988 0.992 0.938 0.944 0.986 0.982 0.972 0.990 0.964 0.956

250 .5 0.982 0.980 0.964 0.964 0.874 0.850 0.980 0.956 0.964 0.962 0.836 0.870

250 .7 0.926 0.942 0.850 0.846 0.554 0.538 0.878 0.878 0.836 0.830 0.606 0.600

500 .3 0.992 0.996 0.508 0.484 0.068 0.080 0.988 0.994 0.002 0 0.010 0.004

500 .5 0.984 0.996 0.434 0.374 0.010 0.010 0.974 0.978 0 0 0 0

500 .7 0.940 0.952 0.796 0.796 0.392 0.346 0.862 0.906 0.778 0.742 0.384 0.380

1000 .3 0.986 0.982 0.448 0.456 0.046 0.050 0.994 0.994 0 0.002 0 0.004

1000 .5 0.984 0.974 0.362 0.342 0.008 0.010 0.960 0.972 0 0 0 0

1000 .7 0.944 0.932 0.688 0.726 0.210 0.242 0.868 0.874 0.628 0.612 0.130 0.180


174

Table 3.2

Proportion of the CART models with a Bayes classifier that did not assign cases to the minority class in


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.992 0.984 0.984 0.982 0.960 0.970 0.988 0.988 0.988 0.976 0.974 0.966

250 .5 0.978 0.964 0.954 0.960 0.900 0.880 0.964 0.962 0.970 0.960 0.884 0.868

250 .7 0.902 0.906 0.856 0.876 0.632 0.624 0.900 0.886 0.874 0.842 0.646 0.670

500 .3 0.992 0.996 0.004 0.002 0 0.006 0.978 0.988 0 0 0 0

500 .5 0.986 0.982 0 0.004 0 0 0.976 0.984 0 0 0 0

500 .7 0.916 0.934 0.778 0.822 0.368 0.388 0.888 0.892 0.746 0.760 0.412 0.418

1000 .3 0.988 0.988 0.004 0 0.004 0.004 0.984 0.99 0 0 0 0

1000 .5 0.982 0.988 0 0 0 0 0.974 0.978 0 0 0 0

1000 .7 0.910 0.912 0.736 0.756 0.178 0.184 0.856 0.876 0.704 0.720 0.212 0.160


175

Table 3.3

Proportion of the Random Forest models with a Bayes classifier that did not assign cases to the

minority class in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.704 0.674 0.408 0.396 0.030 0.028 0.038 0.042 0.000 0.004 0.000 0.000

250 .5 0.718 0.740 0.344 0.334 0.020 0.012 0.010 0.014 0.000 0.000 0.000 0.000

250 .7 0.654 0.654 0.276 0.256 0.006 0.002 0.004 0.000 0.000 0.000 0.000 0.000

500 .3 0.852 0.848 0.610 0.560 0.068 0.074 0.072 0.078 0.000 0.000 0.000 0.000

500 .5 0.826 0.818 0.516 0.504 0.028 0.028 0.012 0.026 0.000 0.000 0.000 0.000

500 .7 0.790 0.806 0.312 0.352 0.008 0.006 0.002 0.004 0.000 0.000 0.000 0.000

1000 .3 0.936 0.940 0.728 0.756 0.122 0.170 0.072 0.060 0.000 0.000 0.000 0.000

1000 .5 0.942 0.928 0.636 0.634 0.078 0.070 0.004 0.006 0.000 0.000 0.000 0.000

1000 .7 0.856 0.874 0.412 0.450 0.006 0.010 0.002 0.000 0.000 0.000 0.000 0.000


176

Table 3.4

Proportion of the Random Forest models with a Bayes classifier that did not assign cases to the


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.480 0.450 0.110 0.108 0.000 0.000 0.674 0.674 0.724 0.378 0.012 0.004

250 .5 0.228 0.256 0.014 0.030 0.000 0.000 0.326 0.334 0.082 0.074 0.000 0.000

250 .7 0.076 0.048 0.002 0.000 0.000 0.000 0.060 0.038 0.000 0.002 0.000 0.000

500 .3 0.610 0.626 0.114 0.128 0.000 0.000 0.756 0.710 0.352 0.366 0.006 0.002

500 .5 0.372 0.374 0.012 0.020 0.000 0.000 0.300 0.334 0.036 0.020 0.000 0.000

500 .7 0.090 0.104 0.000 0.000 0.000 0.000 0.026 0.038 0.000 0.000 0.000 0.000

1000 .3 0.710 0.744 0.146 0.158 0.000 0.000 0.730 0.752 0.336 0.368 0.000 0.004

1000 .5 0.396 0.394 0.018 0.018 0.000 0.000 0.248 0.254 0.012 0.008 0.000 0.000

1000 .7 0.064 0.096 0.000 0.000 0.000 0.000 0.002 0.010 0.000 0.000 0.000 0.000


177

Table 3.5

Proportion of the Lasso models with a Bayes classifier that did not assign cases to the minority class in


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 1 1 1 1 1 1 1 1 1 1 1 1

250 .5 1 1 1 1 0.998 0.996 1 1 1 1 0.978 0.972

250 .7 0.998 0.998 0.996 0.998 0.896 0.886 0.988 0.994 0.948 0.968 0.674 0.690

500 .3 1 1 1 1 1 1 1 1 1 1 1 1

500 .5 1 1 1 1 1 0.998 1 1 1 1 0.988 0.988

500 .7 1 1 0.998 0.998 0.832 0.790 0.986 0.994 0.968 0.956 0.348 0.360

1000 .3 1 1 1 1 1 1 1 1 1 1 1 1

1000 .5 1 1 1 1 0.998 1 1 1 1 1 0.976 0.984

1000 .7 0.998 1 1 0.998 0.622 0.674 0.998 0.992 0.884 0.906 0.102 0.082


178

Table 3.6

Proportion of the Lasso models with a Bayes classifier that did not assign cases to the minority class in


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 1 1 1 1 1 1 1 1 1 1 1 0.998

250 .5 1 1 1 1 0.990 0.998 1 1 0.998 0.998 0.970 0.978

250 .7 0.994 0.992 0.986 0.996 0.796 0.808 0.982 0.970 0.932 0.926 0.546 0.548

500 .3 1 1 1 1 1 1 1 1 1 1 1 1

500 .5 1 1 1 1 0.998 0.998 1 1 1 1 0.984 0.970

500 .7 0.998 1 0.984 0.994 0.596 0.600 0.984 0.984 0.876 0.888 0.232 0.242

1000 .3 1 1 1 1 1 1 1 1 1 1 1 1

1000 .5 1 1 1 1 0.994 0.994 1 1 1 1 0.950 0.958

1000 .7 0.998 0.998 0.980 0.978 0.262 0.300 0.972 0.968 0.744 0.776 0.026 0.018


179

Table 3.7

Proportion of the Relaxed Lasso models with a Bayes classifier that did not assign cases to the minority

class in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 1 0.998 1 1 1 0.990 0.992 0.998 0.990 0.998 0.956 0.954

250 .5 0.996 0.994 0.978 0.986 0.874 0.840 0.956 0.966 0.898 0.874 0.534 0.586

250 .7 0.932 0.948 0.784 0.794 0.244 0.270 0.622 0.616 0.360 0.384 0.054 0.046

500 .3 1 1 1 1 1 0.996 0.998 1 1 1 0.960 0.964

500 .5 1 1 0.986 0.990 0.754 0.730 0.968 0.978 0.856 0.848 0.314 0.306

500 .7 0.962 0.962 0.736 0.752 0.088 0.090 0.552 0.564 0.126 0.178 0 0.002

1000 .3 1 1 1 1 1 0.998 1 1 1 1 0.974 0.960

1000 .5 1 1 0.996 0.992 0.670 0.718 0.978 0.984 0.852 0.826 0.140 0.152

1000 .7 0.970 0.972 0.684 0.726 0.030 0.034 0.450 0.422 0.018 0.038 0 0


180

Table 3.8

Proportion of the Relaxed Lasso models with a Bayes classifier that did not assign cases to the minority

class in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.998 1 0.998 0.994 0.982 0.988 0.996 0.996 0.992 0.986 0.920 0.930

250 .5 0.972 0.984 0.928 0.946 0.678 0.674 0.908 0.906 0.800 0.754 0.488 0.436

250 .7 0.742 0.782 0.540 0.496 0.092 0.112 0.464 0.380 0.198 0.232 0.016 0.012

500 .3 1 1 1 1 0.986 0.986 1 0.998 0.994 0.996 0.924 0.912

500 .5 0.996 0.990 0.960 0.97 0.492 0.530 0.910 0.932 0.702 0.696 0.126 0.156

500 .7 0.768 0.784 0.332 0.326 0.006 0.002 0.314 0.326 0.034 0.040 0 0

1000 .3 1 1 1 1 0.998 0.996 1 1 1 1 0.912 0.940

1000 .5 1 0.996 0.978 0.978 0.320 0.362 0.928 0.958 0.690 0.684 0.034 0.040

1000 .7 0.768 0.802 0.178 0.226 0 0 0.146 0.172 0 0 0 0


181

Table 3.9

Proportion of the Logistic Regression models with a Bayes classifier that did not assign cases to the


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.630 0.630 0.602 0.582 0.282 0.236 0.350 0.388 0.326 0.312 0.090 0.078

250 .5 0.522 0.582 0.446 0.418 0.094 0.106 0.186 0.216 0.128 0.162 0.010 0.006

250 .7 0.368 0.372 0.214 0.202 0.010 0.006 0.048 0.030 0.008 0.006 0 0

500 .3 0.920 0.946 0.922 0.918 0.628 0.646 0.836 0.812 0.734 0.714 0.234 0.208

500 .5 0.846 0.854 0.756 0.726 0.216 0.202 0.588 0.564 0.290 0.264 0.008 0.012

500 .7 0.682 0.712 0.362 0.370 0.020 0.016 0.144 0.142 0.026 0.014 0 0

1000 .3 0.998 0.996 0.994 0.994 0.914 0.896 0.980 0.994 0.946 0.950 0.480 0.484

1000 .5 0.982 0.988 0.930 0.934 0.326 0.386 0.822 0.846 0.484 0.496 0.004 0.010

1000 .7 0.860 0.862 0.508 0.500 0.010 0.014 0.202 0.236 0.002 0.016 0 0


182

Table 3.10

Proportion of the Logistic Regression models with a Bayes classifier that did not assign cases to the


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0 0 0 0 0 0 0 0 0 0 0 0

250 .5 0 0 0 0 0 0 0 0 0 0 0 0

250 .7 0 0 0 0 0 0 0 0 0 0 0 0

500 .3 0.022 0.028 0.016 0.018 0 0.002 0.012 0 0.002 0.004 0 0

500 .5 0.008 0.004 0.004 0.002 0 0 0 0 0 0 0 0

500 .7 0.002 0 0 0 0 0 0 0 0 0 0 0

1000 .3 0.524 0.544 0.464 0.446 0.048 0.040 0.356 0.328 0.192 0.194 0.002 0.006

1000 .5 0.232 0.220 0.094 0.098 0 0 0.056 0.070 0.008 0 0 0

1000 .7 0.016 0.034 0 0 0 0 0 0.002 0 0 0 0


183

Table 3.11

Proportion of the Lasso models with a ROC classifier that did not assign cases to the minority class in


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.968 0.976 0.960 0.988 0.926 0.920 0.958 0.968 0.956 0.962 0.882 0.890

250 .5 0.812 0.838 0.778 0.764 0.530 0.514 0.792 0.772 0.694 0.672 0.422 0.454

250 .7 0.414 0.452 0.240 0.270 0.058 0.054 0.280 0.288 0.154 0.186 0.020 0.032

500 .3 0.976 0.982 0.970 0.962 0.890 0.858 0.966 0.970 0.960 0.968 0.832 0.802

500 .5 0.754 0.758 0.508 0.500 0.118 0.132 0.666 0.676 0.380 0.374 0.058 0.068

500 .7 0.250 0.202 0.016 0.014 0 0 0.136 0.120 0.012 0.002 0 0

1000 .3 0.972 0.968 0.924 0.938 0.654 0.698 0.954 0.960 0.912 0.908 0.582 0.608

1000 .5 0.516 0.506 0.140 0.114 0.002 0.004 0.410 0.414 0.072 0.078 0.004 0

1000 .7 0.022 0.040 0 0 0 0 0.012 0.010 0 0 0 0


184

Table 3.12

Proportion of the Lasso models with a ROC classifier that did not assign cases to the minority class in


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.962 0.946 0.962 0.960 0.894 0.878 0.960 0.960 0.960 0.962 0.854 0.886

250 .5 0.784 0.798 0.716 0.696 0.458 0.432 0.770 0.734 0.638 0.606 0.386 0.360

250 .7 0.348 0.354 0.204 0.200 0.030 0.050 0.290 0.226 0.116 0.136 0.010 0.006

500 .3 0.966 0.966 0.952 0.934 0.850 0.836 0.962 0.948 0.934 0.956 0.790 0.778

500 .5 0.696 0.712 0.368 0.418 0.066 0.082 0.624 0.664 0.342 0.292 0.028 0.046

500 .7 0.134 0.134 0.018 0.010 0 0 0.102 0.088 0.006 0.002 0 0

1000 .3 0.950 0.946 0.898 0.916 0.592 0.612 0.930 0.936 0.892 0.894 0.546 0.526

1000 .5 0.414 0.410 0.100 0.070 0 0.002 0.314 0.332 0.054 0.054 0.002 0

1000 .7 0.008 0.018 0 0 0 0 0.002 0.004 0 0 0 0


185

Table 4.1

Classification accuracy of CART for models with prevalence of .20 and greater

than N=250

Classification rate

Sensitivity

Specificity

Number of Items

i r 2 5

2 5 2 5

10 .3 0.772 0.713 0.070 0.209 0.948 0.839

.5 0.776 0.734 0.163 0.286 0.929 0.847

.7 0.803 0.800 0.343 0.346 0.917 0.914

30 .3 0.712 0.692 0.206 0.241 0.839 0.804

.5 0.734 0.720 0.295 0.309 0.844 0.823

.7 0.800 0.802 0.366 0.362 0.908 0.912

Note: i=number of items; r=diagnosis-test correlation

186

Table 4.2

Classification rate of random forest with a Bayes classifier in conditions with prevalence of .20

Number of items

10 30

Number of item categories

2 5 2 5

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3

250 .3 0.790 0.789 0.782 0.783 0.789 0.791 0.795 0.795

250 .5 0.789 0.788 0.790 0.789 0.793 0.794 0.800 0.800

250 .7 0.806 0.806 0.814 0.814 0.820 0.820 0.828 0.828

500 .3 0.795 0.795 0.785 0.785 0.792 0.792 0.797 0.797

500 .5 0.794 0.794 0.792 0.791 0.797 0.797 0.802 0.802

500 .7 0.812 0.811 0.816 0.816 0.822 0.822 0.830 0.830

1000 .3 0.798 0.797 0.788 0.787 0.794 0.794 0.798 0.798

1000 .5 0.798 0.798 0.793 0.793 0.798 0.798 0.803 0.804

1000 .7 0.815 0.814 0.817 0.817 0.823 0.824 0.832 0.831


187

Table 4.3

Sensitivity of random forest with a Bayes classifier in conditions with prevalence of .20

Number of items

10 30


2 5 2 5

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3

250 .3 0.035 0.035 0.069 0.065 0.054 0.049 0.039 0.038

250 .5 0.114 0.117 0.163 0.161 0.156 0.155 0.143 0.143

250 .7 0.306 0.302 0.324 0.317 0.350 0.341 0.347 0.343

500 .3 0.019 0.018 0.061 0.061 0.039 0.038 0.030 0.029

500 .5 0.095 0.099 0.154 0.158 0.153 0.151 0.140 0.137

500 .7 0.299 0.297 0.328 0.325 0.348 0.348 0.344 0.337

1000 .3 0.008 0.009 0.054 0.054 0.033 0.032 0.024 0.023

1000 .5 0.083 0.078 0.152 0.151 0.151 0.146 0.132 0.129

1000 .7 0.296 0.292 0.328 0.326 0.351 0.350 0.343 0.342


188

Table 4.4

Specificity of random forest with a Bayes classifier in conditions with prevalence of .20

Number of items

10 30


2 5 2 5

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3

250 .3 0.978 0.977 0.961 0.963 0.973 0.975 0.984 0.984

250 .5 0.958 0.956 0.946 0.946 0.953 0.954 0.964 0.964

250 .7 0.931 0.932 0.937 0.938 0.937 0.940 0.949 0.950

500 .3 0.989 0.989 0.966 0.966 0.981 0.982 0.988 0.989

500 .5 0.969 0.968 0.951 0.950 0.957 0.958 0.967 0.968

500 .7 0.940 0.940 0.939 0.939 0.940 0.940 0.952 0.953

1000 .3 0.995 0.995 0.971 0.971 0.985 0.985 0.991 0.992

1000 .5 0.976 0.978 0.953 0.954 0.959 0.961 0.971 0.972

1000 .7 0.944 0.945 0.940 0.940 0.942 0.942 0.953 0.954


189

Table 4.4A

Classification accuracy of lasso logistic regression with a Bayes classifier for

models with prevalence of .20, diagnosis-test correlation of .7, five-category

items, and sample size greater than N=250

Classification rate

Sensitivity

Specificity

Number of items

i ld 10 30

10 30 10 30

500 0 0.816 0.818 0.139 0.148 0.986 0.985

500 .3 0.815 0.818 0.130 0.145 0.986 0.986

1000 0 0.818 0.823 0.151 0.183 0.985 0.982

1000 .3 0.817 0.822 0.151 0.184 0.985 0.982

Note: i=number of items; ld=local dependence

Table 4.5

Classification rate of relaxed lasso logistic regression with a Bayes classifier for conditions with

five-category items and a diagnosis-test correlation of .7

Prevalence

0.05 0.05 1 1 2 2

Local Dependence

N i 0 3 0 3 0 3

250 10 0.943 0.943 0.897 0.897 0.817 0.817

250 30 0.942 0.942 0.897 0.897 0.820 0.820

500 10 0.948 0.948 0.901 0.901 0.823 0.822

500 30 0.947 0.947 0.901 0.901 0.826 0.826

1000 10 0.950 0.950 0.903 0.903 0.825 0.825

1000 30 0.950 0.950 0.904 0.904 0.830 0.829

Note: N=training sample size; i = number of items

190

Table 4.6

Sensitivity of relaxed lasso logistic regression with a Bayes classifier for conditions with


Prevalence

0.05 .10 .20

Local Dependence

N i 0 3 0 3 0 3

250 10 0.179 0.179 0.222 0.214 0.353 0.345

250 30 0.206 0.212 0.252 0.255 0.399 0.393

500 10 0.120 0.116 0.179 0.186 0.352 0.348

500 30 0.154 0.152 0.237 0.234 0.394 0.390

1000 10 0.088 0.089 0.168 0.172 0.348 0.346

1000 30 0.122 0.125 0.223 0.222 0.392 0.391


191

Table 4.7

Specificity of relaxed lasso logistic regression with a Bayes classifier for conditions with


Prevalence

0.05 .10 .20

Local Dependence

N i 0 3 0 3 0 3

250 10 0.953 0.954 0.958 0.959 0.920 0.920

250 30 0.944 0.947 0.947 0.947 0.901 0.902

500 10 0.994 0.994 0.992 0.992 0.972 0.972

500 30 0.990 0.989 0.984 0.984 0.942 0.942

1000 10 0.999 0.999 0.999 0.999 0.991 0.992

1000 30 0.999 0.999 0.996 0.996 0.961 0.963


192

Table 4.8

Classification Rate of logistic regression with a Bayes classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.909 0.909 0.868 0.868 0.759 0.759 0.908 0.908 0.906 0.866 0.758 0.758

250 .5 0.902 0.905 0.864 0.864 0.765 0.766 0.900 0.900 0.863 0.863 0.769 0.768

250 .7 0.898 0.898 0.868 0.867 0.795 0.795 0.894 0.896 0.868 0.866 0.800 0.800

500 .3 0.945 0.944 0.894 0.894 0.787 0.786 0.943 0.943 0.893 0.893 0.785 0.784

500 .5 0.942 0.941 0.890 0.891 0.789 0.788 0.939 0.939 0.888 0.887 0.789 0.789

500 .7 0.936 0.937 0.891 0.891 0.815 0.814 0.935 0.934 0.892 0.892 0.818 0.818

1000 .3 0.950 0.949 0.900 0.899 0.797 0.797 0.949 0.949 0.899 0.899 0.795 0.796

1000 .5 0.949 0.949 0.898 0.898 0.798 0.798 0.948 0.948 0.897 0.897 0.799 0.799

1000 .7 0.947 0.947 0.900 0.900 0.823 0.823 0.947 0.946 0.902 0.901 0.827 0.827


193

Table 4.9

Sensitivity of logistic regression with a Bayes classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.060 0.059 0.054 0.054 0.115 0.113 0.066 0.066 0.068 0.065 0.123 0.124

250 .5 0.110 0.105 0.117 0.114 0.224 0.222 0.131 0.133 0.136 0.135 0.235 0.236

250 .7 0.220 0.228 0.263 0.269 0.406 0.402 0.272 0.267 0.289 0.292 0.415 0.410

500 .3 0.008 0.008 0.012 0.012 0.046 0.046 0.013 0.012 0.017 0.016 0.063 0.062

500 .5 0.024 0.025 0.048 0.048 0.173 0.170 0.046 0.045 0.072 0.080 0.198 0.193

500 .7 0.129 0.120 0.218 0.214 0.389 0.391 0.177 0.179 0.255 0.251 0.400 0.396

1000 .3 0.001 0.001 0.002 0.002 0.017 0.016 0.002 0.002 0.004 0.003 0.029 0.027

1000 .5 0.005 0.005 0.017 0.019 0.145 0.137 0.015 0.014 0.042 0.041 0.167 0.165

1000 .7 0.068 0.062 0.184 0.177 0.382 0.385 0.130 0.130 0.232 0.231 0.395 0.393


194

Table 4.10

Specificity of logistic regression with a Bayes classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.953 0.954 0.958 0.959 0.920 0.920 0.952 0.952 0.951 0.955 0.917 0.917

250 .5 0.944 0.947 0.947 0.947 0.901 0.902 0.940 0.940 0.943 0.944 0.902 0.901

250 .7 0.934 0.933 0.935 0.933 0.892 0.894 0.927 0.929 0.932 0.930 0.897 0.897

500 .3 0.994 0.994 0.992 0.992 0.972 0.972 0.992 0.992 0.990 0.990 0.965 0.965

500 .5 0.990 0.989 0.984 0.984 0.942 0.942 0.986 0.987 0.979 0.977 0.937 0.937

500 .7 0.979 0.979 0.966 0.966 0.921 0.920 0.975 0.974 0.963 0.964 0.923 0.924

1000 .3 0.999 0.999 0.999 0.999 0.991 0.992 0.999 0.999 0.998 0.998 0.987 0.988

1000 .5 0.999 0.999 0.996 0.996 0.961 0.963 0.997 0.997 0.992 0.992 0.957 0.957

1000 .7 0.993 0.993 0.979 0.980 0.934 0.933 0.990 0.989 0.976 0.975 0.935 0.935


195

Table 4.11

Classification rate of random forest with a ROC classifier in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.623 0.616 0.509 0.501 0.496 0.503 0.514 0.519 0.514 0.507 0.524 0.522

250 .5 0.660 0.648 0.570 0.573 0.600 0.599 0.600 0.608 0.607 0.601 0.612 0.610

250 .7 0.726 0.716 0.689 0.679 0.694 0.692 0.716 0.719 0.706 0.710 0.706 0.704

500 .3 0.737 0.720 0.573 0.566 0.504 0.497 0.507 0.505 0.508 0.505 0.526 0.529

500 .5 0.736 0.729 0.607 0.596 0.607 0.602 0.599 0.598 0.607 0.602 0.619 0.615

500 .7 0.773 0.771 0.692 0.694 0.701 0.699 0.718 0.716 0.712 0.711 0.709 0.707

1000 .3 0.813 0.807 0.679 0.672 0.515 0.515 0.492 0.501 0.506 0.501 0.522 0.521

1000 .5 0.801 0.796 0.676 0.662 0.611 0.603 0.598 0.599 0.612 0.611 0.617 0.616

1000 .7 0.818 0.815 0.722 0.716 0.709 0.707 0.719 0.722 0.710 0.714 0.712 0.710


196

Table 4.12

Classification rate of random forest with a ROC classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.512 0.512 0.518 0.515 0.549 0.542 0.563 0.563 0.549 0.552 0.564 0.562

250 .5 0.594 0.597 0.614 0.612 0.625 0.626 0.660 0.667 0.654 0.658 0.652 0.654

250 .7 0.719 0.724 0.720 0.720 0.720 0.721 0.761 0.766 0.749 0.751 0.740 0.738

500 .3 0.500 0.503 0.507 0.513 0.539 0.541 0.544 0.553 0.555 0.555 0.568 0.562

500 .5 0.589 0.584 0.612 0.606 0.632 0.629 0.663 0.666 0.658 0.661 0.655 0.657

500 .7 0.721 0.717 0.723 0.722 0.723 0.723 0.769 0.768 0.753 0.753 0.742 0.742

1000 .3 0.502 0.502 0.509 0.506 0.536 0.534 0.563 0.557 0.562 0.569 0.571 0.572

1000 .5 0.586 0.582 0.608 0.602 0.632 0.632 0.670 0.673 0.662 0.666 0.661 0.661

1000 .7 0.717 0.718 0.722 0.722 0.726 0.724 0.767 0.769 0.756 0.754 0.742 0.744


197

Table 4.13

Sensitivity of random forest with a ROC classifier in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.447 0.448 0.580 0.589 0.638 0.630 0.636 0.631 0.630 0.642 0.621 0.624

250 .5 0.527 0.537 0.650 0.642 0.654 0.647 0.710 0.707 0.688 0.697 0.670 0.672

250 .7 0.632 0.665 0.705 0.729 0.717 0.725 0.795 0.789 0.774 0.766 0.743 0.747

500 .3 0.304 0.324 0.489 0.498 0.621 0.629 0.649 0.654 0.646 0.654 0.629 0.624

500 .5 0.395 0.410 0.590 0.595 0.644 0.65 0.722 0.724 0.698 0.703 0.667 0.674

500 .7 0.517 0.513 0.700 0.693 0.712 0.712 0.801 0.801 0.774 0.774 0.747 0.750

1000 .3 0.205 0.208 0.338 0.353 0.583 0.594 0.673 0.663 0.655 0.660 0.642 0.642

1000 .5 0.273 0.282 0.459 0.481 0.628 0.639 0.727 0.730 0.697 0.699 0.675 0.679

1000 .7 0.393 0.393 0.627 0.632 0.701 0.700 0.805 0.803 0.783 0.777 0.746 0.750


198

Table 4.14

Sensitivity of random forest with a ROC classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.629 0.620 0.624 0.627 0.595 0.605 0.615 0.615 0.628 0.621 0.605 0.609

250 .5 0.721 0.715 0.694 0.691 0.680 0.673 0.708 0.700 0.691 0.689 0.674 0.674

250 .7 0.805 0.799 0.781 0.782 0.758 0.760 0.810 0.808 0.786 0.784 0.767 0.766

500 .3 0.631 0.623 0.638 0.630 0.610 0.609 0.646 0.639 0.629 0.628 0.607 0.615

500 .5 0.717 0.715 0.702 0.701 0.673 0.676 0.708 0.707 0.696 0.696 0.681 0.678

500 .7 0.804 0.801 0.785 0.785 0.761 0.761 0.808 0.809 0.791 0.790 0.768 0.766

1000 .3 0.628 0.626 0.630 0.630 0.619 0.621 0.632 0.641 0.626 0.619 0.610 0.608

1000 .5 0.712 0.715 0.700 0.707 0.682 0.678 0.713 0.713 0.703 0.697 0.676 0.676

1000 .7 0.801 0.802 0.790 0.787 0.761 0.767 0.819 0.816 0.793 0.794 0.772 0.771


199

Table 4.15

Specificity of random forest with a ROC classifier in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.633 0.624 0.501 0.491 0.461 0.471 0.507 0.513 0.501 0.492 0.500 0.497

250 .5 0.667 0.654 0.561 0.565 0.586 0.587 0.594 0.603 0.598 0.590 0.598 0.595

250 .7 0.731 0.718 0.687 0.673 0.689 0.684 0.712 0.715 0.699 0.704 0.696 0.693

500 .3 0.760 0.741 0.583 0.573 0.474 0.464 0.499 0.497 0.493 0.488 0.500 0.505

500 .5 0.754 0.745 0.609 0.596 0.597 0.590 0.593 0.591 0.597 0.590 0.607 0.600

500 .7 0.787 0.785 0.691 0.694 0.699 0.696 0.714 0.712 0.705 0.704 0.700 0.696

1000 .3 0.845 0.838 0.717 0.707 0.498 0.496 0.483 0.493 0.489 0.483 0.492 0.491

1000 .5 0.829 0.823 0.700 0.682 0.607 0.594 0.591 0.592 0.602 0.601 0.602 0.600

1000 .7 0.841 0.837 0.732 0.725 0.711 0.708 0.714 0.718 0.702 0.707 0.704 0.700


200

Table 4.16

Specificity of random forest with a ROC classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.506 0.506 0.506 0.502 0.538 0.527 0.561 0.561 0.545 0.544 0.554 0.550

250 .5 0.588 0.590 0.605 0.604 0.611 0.615 0.658 0.665 0.650 0.654 0.647 0.649

250 .7 0.714 0.721 0.713 0.713 0.711 0.711 0.758 0.763 0.745 0.747 0.733 0.731

500 .3 0.493 0.496 0.492 0.500 0.522 0.524 0.539 0.549 0.547 0.547 0.558 0.549

500 .5 0.582 0.578 0.602 0.595 0.622 0.617 0.661 0.663 0.654 0.657 0.649 0.651

500 .7 0.716 0.712 0.716 0.715 0.714 0.713 0.767 0.766 0.748 0.749 0.735 0.735

1000 .3 0.495 0.495 0.496 0.493 0.516 0.512 0.560 0.552 0.555 0.563 0.561 0.563

1000 .5 0.579 0.575 0.598 0.590 0.619 0.620 0.668 0.671 0.657 0.662 0.657 0.657

1000 .7 0.712 0.714 0.715 0.715 0.717 0.713 0.765 0.766 0.752 0.750 0.735 0.737


201

Table 4.17

Classification rate of logistic regression with a ROC classifier in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.557 0.556 0.566 0.557 0.567 0.558 0.570 0.572 0.576 0.569 0.576 0.579

250 .5 0.643 0.640 0.636 0.633 0.633 0.634 0.642 0.651 0.645 0.642 0.642 0.640

250 .7 0.722 0.723 0.714 0.711 0.708 0.710 0.737 0.739 0.723 0.723 0.721 0.719

500 .3 0.568 0.574 0.574 0.582 0.577 0.574 0.584 0.589 0.580 0.589 0.580 0.583

500 .5 0.655 0.654 0.647 0.643 0.644 0.644 0.667 0.669 0.657 0.656 0.652 0.653

500 .7 0.734 0.732 0.722 0.720 0.712 0.714 0.750 0.748 0.737 0.735 0.728 0.727

1000 .3 0.591 0.587 0.583 0.586 0.578 0.583 0.596 0.599 0.594 0.602 0.588 0.587

1000 .5 0.659 0.655 0.650 0.657 0.647 0.649 0.676 0.675 0.663 0.665 0.657 0.655

1000 .7 0.735 0.739 0.726 0.723 0.717 0.717 0.755 0.760 0.740 0.739 0.731 0.730


202

Table 4.18

Classification rate of logistic regression with a ROC classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.550 0.533 0.526 0.517 0.554 0.545 0.574 0.574 0.579 0.560 0.565 0.563

250 .5 0.647 0.633 0.599 0.602 0.613 0.611 0.670 0.674 0.618 0.612 0.625 0.624

250 .7 0.767 0.769 0.698 0.692 0.696 0.697 0.808 0.808 0.703 0.697 0.702 0.703

500 .3 0.537 0.532 0.555 0.554 0.564 0.566 0.562 0.569 0.571 0.574 0.580 0.570

500 .5 0.623 0.625 0.635 0.629 0.639 0.637 0.643 0.647 0.641 0.639 0.642 0.640

500 .7 0.730 0.728 0.724 0.721 0.720 0.719 0.730 0.728 0.725 0.726 0.725 0.722

1000 .3 0.574 0.565 0.567 0.573 0.581 0.578 0.583 0.577 0.583 0.586 0.583 0.584

1000 .5 0.657 0.657 0.651 0.653 0.652 0.650 0.660 0.663 0.656 0.658 0.654 0.654

1000 .7 0.752 0.752 0.736 0.736 0.731 0.730 0.753 0.754 0.741 0.744 0.735 0.736


203

Table 4.19

Sensitivity of logistic regression with a ROC classifier in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.617 0.612 0.601 0.605 0.590 0.602 0.603 0.601 0.590 0.596 0.582 0.575

250 .5 0.699 0.699 0.682 0.689 0.669 0.667 0.706 0.698 0.687 0.689 0.674 0.677

250 .7 0.788 0.789 0.767 0.774 0.749 0.747 0.794 0.792 0.783 0.784 0.756 0.759

500 .3 0.628 0.620 0.617 0.604 0.599 0.605 0.616 0.610 0.613 0.603 0.603 0.599

500 .5 0.713 0.712 0.698 0.700 0.675 0.675 0.716 0.710 0.701 0.702 0.680 0.679

500 .7 0.802 0.804 0.783 0.784 0.763 0.757 0.812 0.813 0.790 0.794 0.767 0.766

1000 .3 0.627 0.631 0.620 0.616 0.615 0.607 0.629 0.628 0.618 0.606 0.607 0.609

1000 .5 0.725 0.730 0.709 0.701 0.683 0.682 0.723 0.728 0.709 0.707 0.685 0.690

1000 .7 0.816 0.813 0.791 0.791 0.764 0.763 0.825 0.822 0.801 0.802 0.771 0.772


204

Table 4.20

Sensitivity of logistic regression with a ROC classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.600 0.618 0.617 0.625 0.573 0.589 0.545 0.545 0.535 0.563 0.555 0.558

250 .5 0.644 0.667 0.695 0.685 0.658 0.664 0.572 0.561 0.656 0.664 0.643 0.644

250 .7 0.630 0.627 0.759 0.765 0.741 0.742 0.506 0.503 0.736 0.744 0.747 0.739

500 .3 0.632 0.637 0.607 0.607 0.590 0.587 0.591 0.593 0.585 0.578 0.567 0.581

500 .5 0.722 0.714 0.693 0.694 0.667 0.669 0.688 0.683 0.682 0.687 0.670 0.671

500 .7 0.797 0.791 0.779 0.778 0.757 0.759 0.794 0.795 0.783 0.784 0.760 0.763

1000 .3 0.624 0.630 0.622 0.612 0.593 0.598 0.609 0.616 0.601 0.597 0.594 0.590

1000 .5 0.719 0.719 0.706 0.702 0.681 0.682 0.715 0.717 0.704 0.700 0.681 0.682

1000 .7 0.804 0.808 0.797 0.796 0.769 0.772 0.818 0.816 0.802 0.798 0.775 0.773


205

Table 4.21

Specificity of logistic regression with a ROC classifier in conditions with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.554 0.553 0.562 0.552 0.562 0.547 0.568 0.570 0.575 0.566 0.575 0.580

250 .5 0.640 0.637 0.631 0.626 0.623 0.626 0.639 0.648 0.640 0.637 0.634 0.631

250 .7 0.719 0.719 0.708 0.704 0.698 0.701 0.734 0.736 0.716 0.717 0.713 0.709

500 .3 0.565 0.571 0.569 0.579 0.571 0.566 0.582 0.588 0.576 0.587 0.574 0.579

500 .5 0.652 0.651 0.642 0.637 0.636 0.636 0.664 0.667 0.652 0.650 0.646 0.646

500 .7 0.730 0.728 0.715 0.713 0.700 0.704 0.747 0.744 0.731 0.728 0.718 0.717

1000 .3 0.589 0.585 0.579 0.583 0.568 0.576 0.594 0.597 0.591 0.601 0.584 0.581

1000 .5 0.656 0.651 0.643 0.652 0.638 0.641 0.673 0.673 0.658 0.660 0.650 0.647

1000 .7 0.731 0.735 0.719 0.716 0.705 0.705 0.751 0.757 0.733 0.732 0.721 0.719


206

Table 4.22

Specificity of logistic regression with a ROC classifier in conditions with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.547 0.528 0.516 0.505 0.550 0.534 0.575 0.575 0.582 0.559 0.568 0.564

250 .5 0.647 0.631 0.589 0.593 0.602 0.597 0.675 0.680 0.614 0.607 0.620 0.619

250 .7 0.774 0.776 0.692 0.684 0.685 0.686 0.823 0.824 0.699 0.692 0.690 0.694

500 .3 0.532 0.526 0.549 0.548 0.557 0.560 0.560 0.568 0.569 0.574 0.583 0.567

500 .5 0.618 0.620 0.628 0.622 0.633 0.629 0.640 0.645 0.637 0.633 0.635 0.633

500 .7 0.726 0.725 0.717 0.715 0.711 0.709 0.727 0.725 0.719 0.719 0.716 0.712

1000 .3 0.572 0.562 0.561 0.568 0.578 0.573 0.582 0.575 0.581 0.585 0.580 0.583

1000 .5 0.653 0.654 0.644 0.647 0.645 0.641 0.658 0.660 0.651 0.653 0.648 0.647

1000 .7 0.750 0.749 0.729 0.729 0.722 0.720 0.750 0.750 0.734 0.738 0.725 0.727


207

Table 4.23

Classification rate of lasso logistic regression with a ROC classifier in conditions with diagnosis-test

correlation of .70

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 0.716 0.708 0.704 0.694 0.699 0.695 0.727 0.724 0.714 0.710 0.709 0.707

250 30 0.725 0.729 0.715 0.710 0.711 0.710 0.733 0.735 0.724 0.721 0.719 0.717

500 10 0.719 0.715 0.710 0.707 0.708 0.707 0.732 0.729 0.722 0.722 0.721 0.719

500 30 0.734 0.734 0.726 0.727 0.722 0.724 0.744 0.743 0.737 0.731 0.730 0.730

1000 10 0.726 0.728 0.719 0.717 0.714 0.713 0.742 0.744 0.733 0.735 0.724 0.726

1000 30 0.746 0.746 0.738 0.738 0.732 0.731 0.755 0.754 0.745 0.746 0.736 0.736

Note: N=training sample size; i=number of items

208

Table 4.24

Sensitivity of logistic regression with a ROC classifier in conditions with diagnosis-test correlation of

.70

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 0.770 0.773 0.745 0.763 0.733 0.739 0.786 0.789 0.766 0.768 0.750 0.750

250 30 0.784 0.782 0.764 0.773 0.744 0.751 0.798 0.798 0.775 0.777 0.758 0.759

500 10 0.779 0.782 0.766 0.769 0.748 0.750 0.799 0.802 0.782 0.781 0.760 0.759

500 30 0.801 0.794 0.786 0.782 0.763 0.761 0.809 0.808 0.790 0.794 0.768 0.768

1000 10 0.792 0.796 0.778 0.778 0.755 0.754 0.811 0.813 0.791 0.788 0.768 0.764

1000 30 0.812 0.813 0.795 0.794 0.766 0.770 0.824 0.824 0.803 0.799 0.776 0.775


209

Table 4.25

Specificity of lasso logistic regression with a ROC classifier in conditions with diagnosis-test correlation

of .70

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 0.713 0.705 0.699 0.686 0.691 0.683 0.724 0.721 0.708 0.704 0.698 0.696

250 30 0.722 0.726 0.710 0.703 0.703 0.700 0.729 0.732 0.718 0.715 0.710 0.707

500 10 0.716 0.712 0.704 0.700 0.698 0.696 0.729 0.726 0.715 0.715 0.712 0.710

500 30 0.730 0.731 0.719 0.721 0.712 0.714 0.740 0.740 0.731 0.724 0.721 0.720

1000 10 0.723 0.724 0.713 0.710 0.703 0.703 0.738 0.740 0.726 0.729 0.713 0.717

1000 30 0.743 0.742 0.732 0.732 0.724 0.722 0.752 0.750 0.738 0.740 0.727 0.727


210

Table 4.26

Classification rate of relaxed lasso with a ROC classifier in conditions with diagnosis-test correlation of

.70

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 0.720 0.713 0.703 0.702 0.705 0.699 0.734 0.730 0.721 0.718 0.715 0.717

250 30 0.732 0.736 0.721 0.721 0.717 0.717 0.740 0.742 0.731 0.728 0.725 0.724

500 10 0.723 0.720 0.713 0.710 0.711 0.711 0.741 0.740 0.729 0.730 0.727 0.724

500 30 0.740 0.741 0.732 0.731 0.727 0.728 0.753 0.750 0.743 0.740 0.734 0.735

1000 10 0.731 0.733 0.724 0.720 0.716 0.714 0.750 0.754 0.739 0.738 0.730 0.729

1000 30 0.753 0.753 0.741 0.742 0.734 0.732 0.764 0.764 0.749 0.749 0.739 0.739


211

Table 4.27

Sensitivity of relaxed lasso with a ROC classifier in conditions with diagnosis-test correlation of .70

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 0.768 0.770 0.749 0.759 0.732 0.740 0.787 0.790 0.771 0.772 0.754 0.749

250 30 0.777 0.774 0.765 0.765 0.748 0.750 0.799 0.796 0.780 0.781 0.765 0.762

500 10 0.780 0.783 0.771 0.774 0.754 0.752 0.801 0.803 0.788 0.785 0.761 0.764

500 30 0.799 0.794 0.788 0.787 0.765 0.763 0.809 0.813 0.794 0.796 0.771 0.769

1000 10 0.795 0.797 0.782 0.784 0.759 0.761 0.816 0.815 0.794 0.795 0.767 0.769

1000 30 0.813 0.814 0.799 0.797 0.770 0.775 0.825 0.824 0.805 0.802 0.778 0.777


212

Table 4.28

Specificity of relaxed lasso with a ROC classifier in conditions with diagnosis-test correlation of .70

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 0.717 0.710 0.698 0.695 0.698 0.688 0.731 0.727 0.715 0.712 0.705 0.708

250 30 0.729 0.734 0.716 0.716 0.709 0.709 0.737 0.740 0.726 0.723 0.714 0.715

500 10 0.720 0.717 0.707 0.703 0.700 0.701 0.738 0.736 0.722 0.724 0.719 0.714

500 30 0.737 0.738 0.726 0.725 0.717 0.719 0.750 0.746 0.738 0.733 0.725 0.726

1000 10 0.728 0.730 0.717 0.713 0.705 0.702 0.746 0.750 0.733 0.732 0.721 0.718

1000 30 0.750 0.750 0.734 0.736 0.724 0.721 0.761 0.761 0.743 0.743 0.729 0.729


213

Table 6.1

Classification rate difference between random forest with ROC classifier and data generating theta in conditions

with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.003 -0.012 -0.108 -0.106 -0.100 -0.096 -0.121 -0.117 -0.100 -0.101 -0.084 -0.076

250 .5 -0.052 -0.065 -0.109 -0.116 -0.075 -0.070 -0.106 -0.103 -0.074 -0.083 -0.056 -0.055

250 .7 -0.072 -0.081 -0.084 -0.093 -0.060 -0.059 -0.083 -0.077 -0.065 -0.062 -0.042 -0.045

500 .3 0.116 0.093 -0.032 -0.049 -0.094 -0.098 -0.121 -0.118 -0.101 -0.105 -0.079 -0.068

500 .5 0.032 0.022 -0.075 -0.091 -0.060 -0.071 -0.110 -0.112 -0.079 -0.088 -0.052 -0.051

500 .7 -0.026 -0.028 -0.074 -0.074 -0.051 -0.051 -0.079 -0.078 -0.055 -0.055 -0.043 -0.044

1000 .3 0.182 0.186 0.073 0.073 -0.084 -0.089 -0.124 -0.120 -0.101 -0.107 -0.078 -0.075

1000 .5 0.097 0.094 -0.010 -0.026 -0.060 -0.067 -0.095 -0.099 -0.074 -0.071 -0.051 -0.054

1000 .7 0.025 0.024 -0.048 -0.053 -0.043 -0.043 -0.070 -0.070 -0.055 -0.055 -0.040 -0.040


214

Table 6.2

Classification rate difference between random forest with ROC classifier and data generating theta in conditions

with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 -0.129 -0.114 -0.091 -0.097 -0.055 -0.057 -0.065 -0.065 -0.077 -0.061 -0.034 -0.040

250 .5 -0.111 -0.108 -0.075 -0.080 -0.043 -0.041 -0.057 -0.049 -0.027 -0.035 -0.015 -0.011

250 .7 -0.077 -0.073 -0.047 -0.050 -0.026 -0.029 -0.033 -0.036 -0.025 -0.022 -0.008 -0.015

500 .3 -0.128 -0.122 -0.109 -0.105 -0.059 -0.055 -0.078 -0.071 -0.057 -0.061 -0.030 -0.035

500 .5 -0.113 -0.124 -0.069 -0.077 -0.039 -0.042 -0.038 -0.041 -0.027 -0.027 -0.018 -0.009

500 .7 -0.069 -0.074 -0.046 -0.050 -0.026 -0.026 -0.027 -0.026 -0.016 -0.017 -0.009 -0.010

1000 .3 -0.128 -0.117 -0.098 -0.102 -0.066 -0.074 -0.061 -0.070 -0.051 -0.039 -0.027 -0.023

1000 .5 -0.116 -0.119 -0.073 -0.084 -0.040 -0.041 -0.039 -0.024 -0.016 -0.018 -0.009 -0.009

1000 .7 -0.071 -0.067 -0.043 -0.044 -0.025 -0.025 -0.022 -0.024 -0.014 -0.014 -0.010 -0.009


215

Table 6.3

Sensitivity difference between random forest with ROC classifier and data generating theta in conditions with 10

items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 -0.163 -0.156 -0.017 -0.019 0.037 0.035 0.034 0.032 0.031 0.036 0.037 0.029

250 .5 -0.168 -0.156 -0.053 -0.045 -0.015 -0.034 0.011 0.006 -0.010 -0.001 -0.012 -0.015

250 .7 -0.165 -0.136 -0.078 -0.059 -0.048 -0.047 -0.008 -0.012 -0.015 -0.018 -0.030 -0.025

500 .3 -0.317 -0.288 -0.130 -0.109 0.014 0.019 0.038 0.040 0.035 0.042 0.035 0.017

500 .5 -0.323 -0.303 -0.117 -0.103 -0.046 -0.032 0.012 0.015 -0.002 0.006 -0.020 -0.017

500 .7 -0.293 -0.292 -0.106 -0.109 -0.064 -0.064 -0.006 -0.014 -0.028 -0.028 -0.029 -0.030

1000 .3 -0.409 -0.418 -0.281 -0.278 -0.025 -0.005 0.042 0.037 0.035 0.042 0.035 0.032

1000 .5 -0.447 -0.440 -0.248 -0.224 -0.059 -0.052 -0.006 0.002 -0.010 -0.012 -0.019 -0.013

1000 .7 -0.426 -0.431 -0.177 -0.171 -0.082 -0.084 -0.021 -0.022 -0.027 -0.028 -0.033 -0.034


216

Table 6.4

Sensitivity difference between random forest with ROC classifier and data generating theta in conditions with 10

items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.039 0.016 0.019 0.025 0.003 0.006 0.013 0.013 0.023 0.021 0.006 0.015

250 .5 0.013 0.015 0.003 0.008 -0.002 -0.011 0.018 0.010 -0.008 0.005 -0.013 -0.016

250 .7 0.007 0.003 -0.011 -0.006 -0.017 -0.012 0.009 0.015 0.006 0 -0.009 -0.001

500 .3 0.021 0.007 0.035 0.028 0.007 0 0.027 0.021 0.021 0.025 0.004 0.009

500 .5 -0.002 0.005 -0.006 -0.004 -0.011 -0.007 -0.005 -0.004 -0.003 -0.006 -0.001 -0.017

500 .7 -0.016 -0.012 -0.015 -0.010 -0.020 -0.018 -0.005 -0.005 -0.007 -0.005 -0.011 -0.008

1000 .3 0.014 -0.004 0.011 0.013 0.017 0.028 0.012 0.020 0.013 0.002 0.001 -0.005

1000 .5 -0.012 -0.011 -0.012 0 -0.006 -0.009 0.001 -0.018 -0.015 -0.013 -0.014 -0.014

1000 .7 -0.026 -0.032 -0.019 -0.022 -0.020 -0.018 -0.007 -0.004 -0.008 -0.009 -0.008 -0.011


217

Table 6.5

Specificity difference between random forest with ROC classifier and data generating theta in conditions with 10

items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.011 -0.004 -0.118 -0.116 -0.134 -0.129 -0.129 -0.125 -0.114 -0.116 -0.114 -0.103

250 .5 -0.045 -0.060 -0.115 -0.124 -0.090 -0.079 -0.113 -0.108 -0.081 -0.092 -0.067 -0.066

250 .7 -0.067 -0.079 -0.085 -0.096 -0.063 -0.062 -0.087 -0.080 -0.070 -0.067 -0.045 -0.050

500 .3 0.138 0.113 -0.021 -0.042 -0.121 -0.127 -0.130 -0.126 -0.116 -0.122 -0.107 -0.089

500 .5 0.051 0.039 -0.070 -0.090 -0.064 -0.081 -0.117 -0.119 -0.087 -0.098 -0.060 -0.060

500 .7 -0.012 -0.014 -0.070 -0.070 -0.047 -0.048 -0.083 -0.081 -0.058 -0.058 -0.046 -0.047

1000 .3 0.213 0.217 0.112 0.112 -0.099 -0.110 -0.133 -0.128 -0.117 -0.124 -0.106 -0.102

1000 .5 0.125 0.122 0.017 -0.004 -0.061 -0.071 -0.100 -0.105 -0.081 -0.078 -0.058 -0.065

1000 .7 0.048 0.048 -0.033 -0.039 -0.033 -0.032 -0.073 -0.072 -0.058 -0.058 -0.042 -0.042


218

Table 6.6

Specificity difference between random forest with ROC classifier and data generating theta in conditions with 10

items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 -0.137 -0.121 -0.103 -0.110 -0.069 -0.072 -0.069 -0.069 -0.083 -0.071 -0.044 -0.053

250 .5 -0.117 -0.114 -0.083 -0.090 -0.054 -0.049 -0.061 -0.052 -0.029 -0.039 -0.015 -0.010

250 .7 -0.082 -0.077 -0.051 -0.054 -0.028 -0.033 -0.036 -0.038 -0.028 -0.024 -0.008 -0.018

500 .3 -0.136 -0.128 -0.125 -0.120 -0.075 -0.069 -0.084 -0.076 -0.066 -0.070 -0.038 -0.046

500 .5 -0.119 -0.130 -0.076 -0.086 -0.045 -0.051 -0.040 -0.043 -0.030 -0.029 -0.022 -0.007

500 .7 -0.072 -0.077 -0.049 -0.054 -0.028 -0.029 -0.028 -0.027 -0.017 -0.019 -0.008 -0.011

1000 .3 -0.136 -0.122 -0.110 -0.115 -0.087 -0.099 -0.065 -0.075 -0.058 -0.044 -0.034 -0.027

1000 .5 -0.121 -0.124 -0.080 -0.093 -0.049 -0.049 -0.041 -0.024 -0.017 -0.018 -0.007 -0.007

1000 .7 -0.073 -0.069 -0.045 -0.047 -0.027 -0.027 -0.023 -0.025 -0.014 -0.014 -0.010 -0.008


219

Table 6.7

Classification rate difference between logistic regression with ROC classifier and data generating theta in


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 -0.064 -0.070 -0.050 -0.050 -0.029 -0.041 -0.064 -0.064 -0.037 -0.039 -0.031 -0.020

250 .5 -0.067 -0.072 -0.043 -0.057 -0.042 -0.035 -0.064 -0.060 -0.036 -0.041 -0.027 -0.026

250 .7 -0.076 -0.074 -0.059 -0.060 -0.046 -0.041 -0.062 -0.057 -0.048 -0.048 -0.026 -0.030

500 .3 -0.053 -0.056 -0.031 -0.032 -0.021 -0.021 -0.044 -0.034 -0.030 -0.021 -0.024 -0.013

500 .5 -0.049 -0.052 -0.035 -0.044 -0.023 -0.029 -0.042 -0.041 -0.029 -0.034 -0.018 -0.013

500 .7 -0.065 -0.067 -0.044 -0.048 -0.040 -0.036 -0.047 -0.046 -0.030 -0.031 -0.024 -0.023

1000 .3 -0.040 -0.032 -0.022 -0.012 -0.022 -0.021 -0.020 -0.022 -0.014 -0.006 -0.011 -0.009

1000 .5 -0.046 -0.046 -0.036 -0.032 -0.025 -0.022 -0.018 -0.023 -0.023 -0.017 -0.010 -0.015

1000 .7 -0.058 -0.052 -0.043 -0.045 -0.035 -0.032 -0.035 -0.032 -0.025 -0.030 -0.021 -0.020


220

Table 6.8

Classification rate difference between logistic regression with ROC classifier and data generating theta in


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 -0.091 -0.094 -0.082 -0.095 -0.049 -0.054 -0.056 -0.056 -0.048 -0.054 -0.034 -0.038

250 .5 -0.058 -0.072 -0.089 -0.091 -0.055 -0.057 -0.047 -0.042 -0.063 -0.080 -0.043 -0.041

250 .7 -0.030 -0.028 -0.069 -0.077 -0.050 -0.052 0.013 0.007 -0.071 -0.075 -0.047 -0.050

500 .3 -0.091 -0.092 -0.061 -0.063 -0.035 -0.030 -0.061 -0.055 -0.042 -0.042 -0.017 -0.027

500 .5 -0.079 -0.083 -0.046 -0.054 -0.031 -0.034 -0.058 -0.060 -0.044 -0.049 -0.031 -0.025

500 .7 -0.060 -0.062 -0.045 -0.051 -0.029 -0.030 -0.066 -0.065 -0.043 -0.045 -0.026 -0.030

1000 .3 -0.056 -0.053 -0.040 -0.036 -0.021 -0.030 -0.042 -0.050 -0.030 -0.022 -0.015 -0.011

1000 .5 -0.045 -0.044 -0.031 -0.033 -0.020 -0.023 -0.049 -0.034 -0.022 -0.026 -0.015 -0.016

1000 .7 -0.035 -0.034 -0.029 -0.030 -0.020 -0.019 -0.037 -0.039 -0.029 -0.024 -0.017 -0.016


221

Table 6.9

Sensitivity difference between logistic regression with ROC classifier and data generating theta in conditions

with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.007 0.007 0.004 -0.002 -0.010 0.007 0.001 0.003 -0.010 -0.010 -0.002 -0.020

250 .5 0.003 0.005 -0.021 0.001 0.001 -0.014 0.007 -0.003 -0.011 -0.009 -0.009 -0.010

250 .7 -0.009 -0.012 -0.017 -0.014 -0.016 -0.026 -0.008 -0.008 -0.006 0 -0.017 -0.014

500 .3 0.007 0.012 -0.001 -0.002 -0.009 -0.005 0.005 -0.004 0.003 -0.009 0.009 -0.008

500 .5 -0.005 -0.002 -0.008 0.003 -0.015 -0.007 0.006 0.001 0.002 0.005 -0.006 -0.012

500 .7 -0.009 -0.002 -0.022 -0.017 -0.013 -0.019 0.005 -0.002 -0.011 -0.008 -0.009 -0.014

1000 .3 0.014 0.003 0 -0.015 0.008 0.008 -0.003 0.002 -0.001 -0.012 0 -0.002

1000 .5 0.007 0.007 0.002 -0.003 -0.004 -0.009 -0.010 -0.001 0.003 -0.004 -0.009 -0.002

1000 .7 -0.003 -0.010 -0.013 -0.012 -0.018 -0.020 -0.001 -0.004 -0.008 -0.004 -0.008 -0.012


222

Table 6.10

Sensitivity difference between logistic regression with ROC classifier and data generating theta in conditions

with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.010 0.015 0.012 0.023 -0.019 -0.010 -0.057 -0.057 -0.069 -0.037 -0.044 -0.036

250 .5 -0.064 -0.034 0.004 0.003 -0.024 -0.020 -0.118 -0.129 -0.044 -0.020 -0.043 -0.046

250 .7 -0.168 -0.170 -0.033 -0.022 -0.034 -0.030 -0.295 -0.290 -0.044 -0.040 -0.030 -0.028

500 .3 0.021 0.021 0.003 0.004 -0.013 -0.022 -0.027 -0.025 -0.023 -0.026 -0.036 -0.024

500 .5 0.004 0.005 -0.015 -0.011 -0.018 -0.015 -0.025 -0.029 -0.017 -0.014 -0.012 -0.024

500 .7 -0.023 -0.022 -0.021 -0.018 -0.023 -0.020 -0.019 -0.019 -0.015 -0.011 -0.019 -0.011

1000 .3 0.009 -0.001 0.002 -0.005 -0.009 0.005 -0.011 -0.005 -0.011 -0.019 -0.016 -0.024

1000 .5 -0.005 -0.008 -0.006 -0.004 -0.008 -0.005 0.004 -0.013 -0.014 -0.010 -0.010 -0.009

1000 .7 -0.023 -0.026 -0.012 -0.013 -0.012 -0.013 -0.008 -0.005 0.001 -0.006 -0.005 -0.009


223

Table 6.11

Specificity difference between logistic regression with ROC classifier and data generating theta in conditions

with 10 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 -0.067 -0.074 -0.056 -0.055 -0.033 -0.053 -0.068 -0.067 -0.040 -0.042 -0.039 -0.020

250 .5 -0.071 -0.076 -0.046 -0.063 -0.053 -0.041 -0.068 -0.063 -0.038 -0.045 -0.031 -0.030

250 .7 -0.079 -0.078 -0.063 -0.065 -0.054 -0.045 -0.065 -0.059 -0.053 -0.054 -0.029 -0.034

500 .3 -0.056 -0.059 -0.035 -0.036 -0.024 -0.025 -0.047 -0.036 -0.034 -0.023 -0.032 -0.014

500 .5 -0.052 -0.054 -0.038 -0.049 -0.025 -0.035 -0.045 -0.043 -0.033 -0.038 -0.021 -0.014

500 .7 -0.068 -0.070 -0.046 -0.051 -0.046 -0.040 -0.050 -0.048 -0.032 -0.034 -0.028 -0.026

1000 .3 -0.043 -0.034 -0.025 -0.012 -0.029 -0.029 -0.021 -0.023 -0.015 -0.006 -0.014 -0.011

1000 .5 -0.049 -0.049 -0.041 -0.035 -0.030 -0.025 -0.018 -0.024 -0.026 -0.019 -0.011 -0.018

1000 .7 -0.061 -0.054 -0.046 -0.049 -0.039 -0.036 -0.036 -0.034 -0.027 -0.033 -0.024 -0.022


224

Table 6.12

Specificity difference between logistic regression with ROC classifier and data generating theta in conditions

with 30 items

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 -0.097 -0.100 -0.093 -0.108 -0.057 -0.065 -0.056 -0.056 -0.047 -0.055 -0.031 -0.039

250 .5 -0.058 -0.074 -0.099 -0.101 -0.062 -0.066 -0.044 -0.037 -0.065 -0.087 -0.043 -0.039

250 .7 -0.023 -0.021 -0.073 -0.083 -0.054 -0.058 0.029 0.022 -0.074 -0.079 -0.051 -0.055

500 .3 -0.096 -0.098 -0.068 -0.071 -0.040 -0.032 -0.062 -0.057 -0.044 -0.043 -0.013 -0.028

500 .5 -0.084 -0.088 -0.050 -0.058 -0.035 -0.039 -0.060 -0.062 -0.047 -0.053 -0.036 -0.026

500 .7 -0.062 -0.064 -0.047 -0.054 -0.031 -0.033 -0.068 -0.068 -0.046 -0.049 -0.027 -0.035

1000 .3 -0.059 -0.056 -0.045 -0.039 -0.024 -0.038 -0.043 -0.052 -0.032 -0.022 -0.016 -0.007

1000 .5 -0.047 -0.046 -0.034 -0.036 -0.023 -0.028 -0.051 -0.035 -0.023 -0.027 -0.017 -0.017

1000 .7 -0.036 -0.034 -0.031 -0.032 -0.022 -0.020 -0.038 -0.040 -0.032 -0.026 -0.020 -0.018


225

Table 6.13

Classification rate difference between lasso logistic regression with a ROC classifier and data generating theta in

conditions with diagnosis-test correlation of .70.

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 -0.090 -0.101 -0.074 -0.084 -0.056 -0.057 -0.079 -0.083 -0.061 -0.066 -0.040 -0.043

250 30 -0.082 -0.080 -0.056 -0.065 -0.036 -0.039 -0.074 -0.072 -0.056 -0.056 -0.030 -0.036

500 10 -0.084 -0.088 -0.056 -0.061 -0.044 -0.043 -0.072 -0.067 -0.045 -0.045 -0.030 -0.031

500 30 -0.059 -0.059 -0.043 -0.045 -0.027 -0.026 -0.054 -0.054 -0.032 -0.039 -0.020 -0.022

1000 10 -0.068 -0.064 -0.050 -0.052 -0.038 -0.036 -0.048 -0.048 -0.032 -0.034 -0.028 -0.024

1000 30 -0.042 -0.040 -0.027 -0.028 -0.019 -0.017 -0.034 -0.039 -0.025 -0.022 -0.016 -0.016


226

Table 6.14

Sensitivity difference between lasso logistic regression with a ROC classifier and data generating theta in conditions with

diagnosis-test correlation of .70.

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 -0.017 -0.014 -0.032 -0.016 -0.031 -0.033 -0.007 0.001 -0.019 -0.012 -0.022 -0.022

250 30 0.001 0.002 -0.025 -0.010 -0.030 -0.022 0.012 0.013 0.003 -0.003 -0.017 -0.008

500 10 -0.026 -0.018 -0.040 -0.033 -0.028 -0.026 -0.002 -0.010 -0.019 -0.021 -0.016 -0.021

500 30 -0.016 -0.017 -0.014 -0.013 -0.017 -0.018 -0.002 -0.003 -0.008 -0.001 -0.011 -0.007

1000 10 -0.025 -0.027 -0.026 -0.025 -0.028 -0.029 -0.014 -0.011 -0.018 -0.018 -0.011 -0.020

1000 30 -0.015 -0.021 -0.014 -0.015 -0.015 -0.016 -0.002 0.004 0.002 -0.004 -0.005 -0.006


227

Table 6.15

Specificity difference between lasso logistic regression with a ROC classifier and data generating theta in conditions with


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 -0.094 -0.106 -0.079 -0.091 -0.062 -0.063 -0.083 -0.088 -0.066 -0.072 -0.044 -0.048

250 30 -0.086 -0.084 -0.059 -0.070 -0.037 -0.043 -0.078 -0.076 -0.062 -0.061 -0.033 -0.043

500 10 -0.087 -0.091 -0.058 -0.064 -0.048 -0.048 -0.076 -0.071 -0.047 -0.047 -0.034 -0.034

500 30 -0.061 -0.061 -0.046 -0.048 -0.030 -0.027 -0.057 -0.057 -0.034 -0.044 -0.022 -0.026

1000 10 -0.070 -0.066 -0.052 -0.055 -0.040 -0.038 -0.049 -0.050 -0.034 -0.036 -0.032 -0.025

1000 30 -0.043 -0.041 -0.028 -0.030 -0.019 -0.018 -0.036 -0.041 -0.028 -0.024 -0.018 -0.018


228

Table 6.16

Classification rate difference between relaxed lasso with a ROC classifier and data generating theta in conditions with


Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 -0.086 -0.097 -0.075 -0.076 -0.051 -0.053 -0.072 -0.077 -0.054 -0.058 -0.033 -0.033

250 30 -0.075 -0.073 -0.050 -0.053 -0.030 -0.032 -0.067 -0.065 -0.049 -0.048 -0.024 -0.028

500 10 -0.081 -0.083 -0.053 -0.058 -0.041 -0.040 -0.063 -0.057 -0.038 -0.036 -0.024 -0.026

500 30 -0.053 -0.052 -0.037 -0.041 -0.023 -0.021 -0.045 -0.047 -0.025 -0.031 -0.016 -0.017

1000 10 -0.063 -0.058 -0.046 -0.049 -0.036 -0.035 -0.040 -0.038 -0.026 -0.031 -0.022 -0.022

1000 30 -0.034 -0.032 -0.024 -0.024 -0.017 -0.017 -0.026 -0.029 -0.021 -0.018 -0.013 -0.013


229

Table 6.17

Sensitivity difference between relaxed lasso with a ROC classifier and data generating theta in conditions with diagnosis-

test correlation of .70.

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 -0.020 -0.017 -0.028 -0.020 -0.032 -0.032 -0.006 0.001 -0.015 -0.008 -0.018 -0.022

250 30 -0.006 -0.006 -0.025 -0.018 -0.025 -0.023 0.012 0.011 0.007 0.001 -0.010 -0.004

500 10 -0.024 -0.017 -0.035 -0.028 -0.023 -0.025 0 -0.009 -0.013 -0.017 -0.015 -0.016

500 30 -0.018 -0.017 -0.012 -0.008 -0.015 -0.016 -0.001 0.002 -0.004 0.001 -0.008 -0.005

1000 10 -0.023 -0.026 -0.022 -0.019 -0.024 -0.022 -0.010 -0.010 -0.015 -0.011 -0.012 -0.015

1000 30 -0.015 -0.020 -0.010 -0.012 -0.011 -0.011 -0.001 0.003 0.004 -0.001 -0.003 -0.004


230

Table 6.18

Specificity difference between relaxed lasso with a ROC classifier and data generating theta in conditions with diagnosis-

test correlation of .70.

Item Categories

2 5

Prevalence

0.05 .10 .20 0.05 .10 .20

Local Dependence

N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 10 -0.090 -0.101 -0.080 -0.082 -0.055 -0.058 -0.076 -0.081 -0.058 -0.064 -0.037 -0.036

250 30 -0.079 -0.076 -0.053 -0.057 -0.031 -0.035 -0.071 -0.068 -0.055 -0.053 -0.028 -0.035

500 10 -0.084 -0.086 -0.055 -0.062 -0.046 -0.043 -0.066 -0.060 -0.041 -0.038 -0.027 -0.029

500 30 -0.054 -0.054 -0.040 -0.044 -0.024 -0.022 -0.047 -0.050 -0.028 -0.034 -0.018 -0.020

1000 10 -0.065 -0.060 -0.048 -0.052 -0.039 -0.039 -0.041 -0.040 -0.027 -0.033 -0.025 -0.023

1000 30 -0.035 -0.033 -0.025 -0.025 -0.019 -0.018 -0.027 -0.031 -0.024 -0.020 -0.016 -0.015


231

Table 7.1

Number of replications in conditions with binary items where CART did not choose at least

two items (where theta score was not estimated)

Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 404 447 411 449 408 446 434 452 423 457 410 462

250 .5 332 403 351 404 343 415 356 390 338 408 326 389

250 .7 213 305 249 299 213 294 233 309 183 291 179 276

500 .3 393 443 411 439 20 2 25 1 26 0 28 5

500 .5 296 341 285 358 0 0 0 0 0 0 1 0

500 .7 140 211 143 201 102 140 90 157 59 97 54 97

1000 .3 341 417 342 408 14 3 13 1 20 2 19 2

1000 .5 186 253 183 244 0 0 0 0 0 0 0 0

1000 .7 55 93 48 101 26 45 29 42 7 12 9 9


232

Table 7.2

Number of replications in conditions with polytomous items where CART did not choose at

least two items (where theta score was not estimated)

Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 451 464 452 464 454 462 463 473 463 482 455 474

250 .5 432 452 418 448 428 449 434 443 383 427 409 438

250 .7 354 392 344 373 314 391 325 360 286 345 287 362

500 .3 456 470 459 466 1 0 0 0 5 0 2 0

500 .5 362 421 359 426 0 0 0 0 0 0 0 0

500 .7 240 308 239 318 163 216 150 225 112 153 99 158

1000 .3 429 453 418 447 0 0 1 0 0 1 2 1

1000 .5 271 315 275 314 0 0 0 0 0 0 0 0

1000 .7 101 153 120 152 37 82 36 87 16 27 16 25


233

Table 7.3

Correlation between true and estimated theta from CART in conditions with binary items

Prevalence

.10 .20

Local Dependence

0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30

500 .3 0.999 0.979 0.999 0.979 0.999 0.990 0.999 0.990

500 .5 0.998 0.970 0.997 0.970 0.999 0.986 0.999 0.987

500 .7 0.764 0.622 0.770 0.620 0.793 0.643 0.799 0.649

1000 .3 1 0.996 1 0.996 1 0.999 1 0.999

1000 .5 1 0.992 1 0.992 1 0.998 1 0.998

1000 .7 0.835 0.686 0.843 0.682 0.878 0.742 0.874 0.741


234

Table 7.4

Correlation between true and estimated theta from CART in conditions with polytomous

items

Prevalence

.10 .20

Local Dependence

0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30

500 .3 0.999 0.980 0.999 0.980 1 0.991 1 0.990

500 .5 0.998 0.975 0.998 0.975 1 0.989 1 0.988

500 .7 0.773 0.706 0.783 0.698 0.800 0.727 0.798 0.726

1000 .3 1 0.995 1 0.995 1 0.998 1 0.999

1000 .5 1 0.993 1 0.994 1 0.998 1 0.998

1000 .7 0.838 0.762 0.837 0.754 0.874 0.784 0.877 0.788


235

Table 7.5

Mean squared error of estimated theta from CART in conditions with binary items

Prevalence

.10 .20

Local Dependence

0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30

500 .3 0.001 0.019 0.001 0.019 0 0.009 0 0.009

500 .5 0.002 0.028 0.002 0.027 0.001 0.012 0.001 0.012

500 .7 0.185 0.343 0.178 0.341 0.163 0.324 0.156 0.317

1000 .3 0 0.004 0 0.003 0 0.001 0 0.001

1000 .5 0 0.007 0 0.007 0 0.002 0 0.002

1000 .7 0.129 0.286 0.121 0.287 0.095 0.234 0.098 0.235


236

Table 7.6

Mean squared error of estimated theta from CART in conditions with polytomous items

Prevalence

.10 .20

Local Dependence

0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30

500 .3 0.001 0.019 0.001 0.019 0 0.009 0 0.009

500 .5 0.002 0.024 0.001 0.024 0 0.011 0 0.011

500 .7 0.198 0.282 0.189 0.286 0.174 0.261 0.176 0.260

1000 .3 0 0.004 0 0.004 0 0.001 0 0.001

1000 .5 0 0.006 0 0.006 0 0.002 0 0.002

1000 .7 0.141 0.227 0.141 0.236 0.110 0.207 0.106 0.201


237

Table 7.7

Number of replications in conditions with binary items where lasso did not choose at least


Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 492 487 492 483 490 489 498 489 482 464 473 457

250 .5 428 413 439 421 410 390 412 382 318 269 314 251

250 .7 247 199 275 204 173 129 179 118 51 26 59 33

500 .3 493 489 494 488 493 484 491 476 464 442 457 433

500 .5 408 377 407 380 302 231 310 241 112 48 104 63

500 .7 168 92 144 89 20 10 23 5 1 1 1 1

1000 .3 490 484 489 480 480 463 481 467 378 337 394 330

1000 .5 288 229 286 230 118 68 101 52 4 0 6 1

1000 .7 26 5 29 13 0 0 2 0 0 0 2 0


238

Table 7.8

Number of replications in conditions with polytomous items where lasso did not choose at least


Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 484 488 490 488 488 487 488 488 463 452 468 456

250 .5 420 407 425 401 396 349 371 335 258 235 274 208

250 .7 188 174 199 141 114 81 134 88 25 8 28 9

500 .3 487 488 493 483 488 477 490 486 453 423 443 414

500 .5 376 340 389 368 235 203 257 192 65 29 67 41

500 .7 106 69 89 58 13 4 12 3 0 0 0 0

1000 .3 485 472 486 477 467 462 474 459 341 316 354 325

1000 .5 264 179 255 201 61 45 71 41 6 3 3 1

1000 .7 13 3 10 6 0 0 0 0 0 1 0 0


239

Table 7.9

Correlation between true and estimated theta from lasso in conditions with diagnosis-

test correlation of .70


2 5

Local Dependence

0 .3 0 .3

Number of items

N p 10 30 10 30 10 30 10 30

250 .05 0.702 0.651 0.690 0.659 0.794 0.778 0.797 0.779

250 .10 0.743 0.709 0.741 0.710 0.820 0.813 0.823 0.805

250 .20 0.808 0.780 0.806 0.780 0.873 0.862 0.865 0.861

500 .05 0.743 0.715 0.747 0.711 0.832 0.812 0.831 0.813

500 .10 0.807 0.798 0.812 0.796 0.884 0.877 0.878 0.872

500 .20 0.890 0.869 0.892 0.869 0.936 0.919 0.931 0.920

1000 .05 0.804 0.792 0.809 0.796 0.890 0.879 0.892 0.877

1000 .10 0.882 0.869 0.883 0.866 0.935 0.925 0.934 0.923

1000 .20 0.938 0.915 0.937 0.916 0.964 0.951 0.965 0.951

Note: N=training sample size; p=prevalence

240

Table 7.10

Mean squared error of the estimated theta from lasso in conditions with diagnosis-test

correlation of .70


2 5

Local Dependence

0 .3 0 .3

Number of items

N p 10 30 10 30 10 30 10 30

250 .05 0.178 0.209 0.174 0.209 0.178 0.209 0.174 0.209

250 .10 0.156 0.175 0.153 0.184 0.156 0.175 0.153 0.184

250 .20 0.111 0.132 0.117 0.131 0.111 0.132 0.117 0.131

500 .05 0.146 0.179 0.146 0.178 0.146 0.179 0.146 0.178

500 .10 0.102 0.118 0.107 0.122 0.102 0.118 0.107 0.122

500 .20 0.056 0.078 0.061 0.076 0.056 0.078 0.061 0.076

1000 .05 0.096 0.115 0.094 0.117 0.096 0.115 0.094 0.117

1000 .10 0.057 0.072 0.057 0.074 0.057 0.072 0.057 0.074

1000 .20 0.032 0.047 0.031 0.047 0.032 0.047 0.031 0.047

Note: N=training sample size; p=prevalence

241

Table 7.11

Mean squared error of the estimated theta from random forest in conditions with binary items

Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 0.140 0.080 0.145 0.081 0.141 0.080 0.141 0.080 0.138 0.078 0.136 0.078

250 .5 0.145 0.089 0.147 0.088 0.141 0.087 0.141 0.086 0.133 0.080 0.131 0.079

250 .7 0.150 0.095 0.156 0.095 0.146 0.092 0.146 0.092 0.129 0.080 0.129 0.081

500 .3 0.138 0.079 0.139 0.079 0.136 0.079 0.134 0.080 0.132 0.076 0.130 0.076

500 .5 0.143 0.089 0.146 0.089 0.137 0.087 0.139 0.086 0.126 0.078 0.127 0.078

500 .7 0.151 0.098 0.148 0.098 0.140 0.092 0.141 0.092 0.125 0.080 0.126 0.080

1000 .3 0.136 0.079 0.138 0.079 0.132 0.078 0.133 0.078 0.126 0.074 0.128 0.075

1000 .5 0.142 0.090 0.143 0.090 0.133 0.085 0.135 0.086 0.122 0.077 0.120 0.078

1000 .7 0.148 0.098 0.146 0.098 0.137 0.094 0.135 0.091 0.121 0.080 0.122 0.079


242

Table 7.12

Mean squared error of the estimated theta from random forest in conditions with polytomous items

Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 0.104 0.051 0.104 0.049 0.103 0.049 0.103 0.051 0.099 0.050 0.103 0.050

250 .5 0.110 0.054 0.110 0.055 0.108 0.052 0.107 0.052 0.097 0.048 0.098 0.048

250 .7 0.109 0.054 0.111 0.055 0.105 0.051 0.106 0.051 0.093 0.045 0.092 0.045

500 .3 0.105 0.052 0.104 0.053 0.102 0.051 0.101 0.051 0.095 0.049 0.097 0.048

500 .5 0.114 0.058 0.115 0.057 0.107 0.054 0.107 0.054 0.096 0.048 0.094 0.047

500 .7 0.118 0.060 0.118 0.060 0.107 0.053 0.105 0.053 0.090 0.045 0.091 0.044

1000 .3 0.102 0.052 0.103 0.053 0.100 0.051 0.101 0.050 0.093 0.047 0.095 0.047

1000 .5 0.115 0.060 0.114 0.060 0.105 0.055 0.105 0.054 0.092 0.045 0.093 0.046

1000 .7 0.120 0.064 0.122 0.063 0.106 0.054 0.107 0.054 0.087 0.043 0.089 0.043


243

Table 7.13.

Correlation between true and estimated theta from random forest in conditions with binary items

Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 0.820 0.910 0.814 0.911 0.820 0.912 0.819 0.911 0.823 0.914 0.825 0.914

250 .5 0.813 0.902 0.810 0.901 0.819 0.904 0.817 0.904 0.829 0.911 0.832 0.912

250 .7 0.807 0.894 0.800 0.894 0.812 0.898 0.811 0.897 0.834 0.912 0.834 0.910

500 .3 0.823 0.912 0.820 0.913 0.826 0.914 0.827 0.912 0.831 0.917 0.832 0.916

500 .5 0.818 0.902 0.811 0.901 0.824 0.905 0.822 0.905 0.838 0.914 0.836 0.913

500 .7 0.807 0.892 0.809 0.892 0.820 0.898 0.817 0.898 0.839 0.912 0.837 0.912

1000 .3 0.825 0.913 0.822 0.912 0.831 0.915 0.828 0.914 0.838 0.919 0.835 0.917

1000 .5 0.818 0.902 0.816 0.900 0.829 0.907 0.826 0.905 0.843 0.915 0.845 0.914

1000 .7 0.810 0.892 0.811 0.892 0.824 0.897 0.825 0.899 0.845 0.913 0.843 0.913


244

Table 7.14.

Correlation between true and estimated theta from random forest in conditions with polytomous items

Prevalence

.05 .10 .20

Local Dependence

0 .3 0 .3 0 .3

Number of items

N r 10 30 10 30 10 30 10 30 10 30 10 30

250 .3 0.820 0.910 0.814 0.911 0.820 0.912 0.819 0.911 0.823 0.914 0.825 0.914

250 .5 0.813 0.902 0.810 0.901 0.819 0.904 0.817 0.904 0.829 0.911 0.832 0.912

250 .7 0.807 0.894 0.800 0.894 0.812 0.898 0.811 0.897 0.834 0.912 0.834 0.910

500 .3 0.823 0.912 0.820 0.913 0.826 0.914 0.827 0.912 0.831 0.917 0.832 0.916

500 .5 0.818 0.902 0.811 0.901 0.824 0.905 0.822 0.905 0.838 0.914 0.836 0.913

500 .7 0.807 0.892 0.809 0.892 0.820 0.898 0.817 0.898 0.839 0.912 0.837 0.912

1000 .3 0.825 0.913 0.822 0.912 0.831 0.915 0.828 0.914 0.838 0.919 0.835 0.917

1000 .5 0.818 0.902 0.816 0.900 0.829 0.907 0.826 0.905 0.843 0.915 0.845 0.914

1000 .7 0.810 0.892 0.811 0.892 0.824 0.897 0.825 0.899 0.845 0.913 0.843 0.913


245

Table 7.A.

Average number of items chosen by CART in conditions with 10 items


2 5

Prevalence

0.05 .10 0.20 .05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.848 0.786 0.852 0.632 0.808 0.918 0.482 0.432 0.400 0.378 0.360 0.420

250 .5 1.404 1.296 1.350 1.286 1.440 1.504 0.628 0.816 0.740 0.642 1.038 0.864

250 .7 2.490 2.254 2.428 2.420 2.594 2.780 1.328 1.238 1.416 1.386 1.716 1.648

500 .3 1.072 0.922 9.554 9.464 9.456 9.398 0.508 0.448 9.920 9.930 9.886 9.948

500 .5 1.856 1.914 9.844 9.822 9.934 9.944 1.078 1.128 9.848 9.876 9.972 9.966

500 .7 3.360 3.292 3.740 3.900 4.382 4.574 2.042 1.956 2.356 2.520 2.852 2.888

1000 .3 1.916 2.002 9.694 9.732 9.588 9.620 0.722 0.804 9.998 9.976 10 9.960

1000 .5 3.086 3.380 9.962 9.970 9.984 9.988 1.824 1.852 9.996 9.994 10 9.998

1000 .7 4.518 4.700 5.552 5.670 6.626 6.546 3.152 2.890 3.970 4.042 4.962 5.030


246

Table 7.B.

Average number of items chosen by CART in conditions with 30 items


2 5

Prevalence

0.05 .10 0.20 .05 .10 .20

Number of items

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.564 0.544 0.506 0.442 0.492 0.436 0.360 0.360 0.352 0.342 0.238 0.272

250 .5 0.900 0.940 0.858 0.930 0.892 1.012 0.516 0.580 0.528 0.576 0.720 0.670

250 .7 1.672 1.680 1.612 1.624 1.810 1.836 0.928 1.040 0.950 1.124 1.260 1.236

500 .3 0.548 0.572 24.052 24.094 26.832 26.736 0.356 0.380 21.362 21.292 25.132 25.046

500 .5 1.282 1.242 22.338 22.340 25.794 25.972 0.790 0.754 19.934 20.094 24.294 24.208

500 .7 2.456 2.292 2.906 2.798 3.278 3.332 1.506 1.468 2.076 1.980 2.478 2.390

1000 .3 0.826 0.948 28.506 28.678 29.332 29.360 0.610 0.560 27.346 27.448 29.046 29.042

1000 .5 2.142 2.008 27.332 27.392 29.114 29.048 1.542 1.486 26.574 26.584 28.762 28.782

1000 .7 3.776 3.742 5.042 4.626 7.624 7.310 2.678 2.612 3.372 3.224 3.976 4.312


247

Table 7.C.

Average number of items chosen by lasso in conditions with 10 items


2 5

Prevalence

0.05 .10 0.20 .05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.058 0.050 0.072 0.018 0.136 0.186 0.106 0.062 0.090 0.074 0.260 0.210

250 .5 0.478 0.386 0.634 0.642 1.264 1.348 0.544 0.522 0.722 0.860 1.744 1.654

250 .7 1.842 1.644 2.588 2.502 4.128 4.074 2.290 2.246 3.024 2.948 4.536 4.276

500 .3 0.048 0.046 0.056 0.070 0.220 0.288 0.084 0.054 0.078 0.060 0.318 0.400

500 .5 0.650 0.632 1.394 1.364 3.098 3.162 0.870 0.798 1.790 1.700 3.440 3.482

500 .7 2.680 2.868 4.474 4.536 6.036 6.106 3.308 3.408 4.864 4.738 6.334 6.122

1000 .3 0.062 0.082 0.142 0.116 0.868 0.750 0.090 0.108 0.212 0.174 1.022 1

1000 .5 1.462 1.544 3.022 3.200 5.150 5.120 1.786 1.830 3.428 3.488 5.056 5.078

1000 .7 4.538 4.602 6 5.984 7.242 7.234 5.132 5.216 6.402 6.374 7.454 7.458


248

Table 7.D.

Average number of items chosen by lasso in conditions with 30 items


2 5

Prevalence

0.05 .10 0.20 .05 .10 .20

Local Dependence

N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3

250 .3 0.100 0.112 0.086 0.106 0.314 0.306 0.094 0.094 0.104 0.108 0.364 0.308

250 .5 0.714 0.608 0.970 0.978 2.162 2.168 0.700 0.768 1.174 1.366 2.334 2.524

250 .7 2.838 2.900 4.022 4.218 6.448 6.432 3.162 3.576 4.676 4.476 6.816 6.810

500 .3 0.076 0.090 0.124 0.164 0.412 0.464 0.084 0.118 0.140 0.082 0.524 0.624

500 .5 0.970 1.024 2.420 2.320 5.082 4.972 1.274 1.066 2.610 2.728 5.318 5.114

500 .7 4.792 4.674 7.558 7.556 10.432 10.434 4.908 4.984 7.660 7.442 9.992 9.994

1000 .3 0.136 0.160 0.258 0.240 1.320 1.418 0.186 0.148 0.272 0.296 1.470 1.382

1000 .5 2.548 2.594 5.296 5.344 8.558 8.498 3.074 2.758 5.210 5.190 8.076 7.840

1000 .7 7.986 7.884 11.034 10.888 13.938 13.986 8.060 7.934 10.814 10.638 13.33 13.47


249

Figure 2.1. Variable Importance Measures for data-generating model. Note:

Classification rate is on the left panel, sensitivity is on the center panel, and specificity is

on the right panel.

ld1

ncat1

nitem1

ss1

prev1

cor1

0 50 100 150 200

r1

IncNodePurity

ld1

ncat1

nitem1

ss1

prev1

cor1

0 50 150 250

r2

IncNodePurity

ncat1

ld1

nitem1

ss1

prev1

cor1

0 50 100 150

r3

IncNodePurity

250

Figure 2.2. Variable importance measures of the raw summed scores. Note:


on the right panel.

ld1

ss1

ncat1

nitem1

prev1

cor1

0 50 100 150

r4

IncNodePurity

ld1

ncat1

nitem1

ss1

prev1

cor1

0 50 100 200

r5

IncNodePurity

ld1

ss1

ncat1

nitem1

prev1

cor1

0 50 100 150

r6

IncNodePurity

251

Figure 2.3. Variable importance measures of Estimated Theta. Note: Classification rate is

on the left panel, sensitivity is on the center panel, and specificity is on the right panel.

ld1

ss1

ncat1

nitem1

prev1

cor1

0 50 100 150

r7

IncNodePurity

ncat1

ld1

nitem1

ss1

prev1

cor1

0 50 100 200

r8

IncNodePurity

ld1

ss1

ncat1

nitem1

prev1

cor1

0 50 100 150

r9

IncNodePurity

252

Figure 4.1. Regression Tree to predict CART Classification Rate

|cor1: 3,5

nitem1: 10

ncat1: 2

cor1: 3

cor1: 3

ncat1: 2

0.7746 0.7125 0.7343

0.7117 0.6919 0.7277

0.8011

253

Figure 4.2. Regression Tree to predict CART sensitivity

|cor1: 3,5

cor1: 3

ncat1: 2

nitem1: 10

nitem1: 10

ncat1: 2

0.0683 0.2069

0.2260

0.1637 0.2859

0.3026

0.3541

254

Figure 4.3. Regression Tree to predict CART specificity

|cor1: 3,5

nitem1: 10

ncat1: 2 ncat1: 2

0.9389 0.8422

0.8410 0.8142

0.9129

255

Figure 4.4. Variable Importance Measures of CART. Note: Classification rate is on the

left panel, sensitivity is on the center panel, and specificity is on the right panel.

ld1

ss1

ncat1

nitem1

cor1

0 1 2 3 4 5

r34

IncNodePurity

ld1

ss1

ncat1

nitem1

cor1

0 5 10 15 20

r35

IncNodePurity

ld1

ss1

ncat1

cor1

nitem1

0 1 2 3 4

r36

IncNodePurity

256

Figure 4.5. Regression Tree to predict Random Forest Classification Rate

|cor1: 3,5

nitem1: 10ncat1: 2

nitem1: 10

ncat1: 20.7937 0.7880 0.7965

0.81320.8217 0.8303

257

Figure 4.6. Regression Tree to predict Random Forest Specificity

|cor1: 5,7

cor1: 5 ss1: 250ncat1: 2 nitem1: 10

0.9605 0.9419

0.9743 0.9874

0.9686 0.9899

258

Figure 4.7. Variable importance measures of Random Forest with Bayes classifier. Note:


on the right panel.

ld1

ncat1

ss1

nitem1

cor1

0.0 0.5 1.0 1.5

r31

IncNodePurity

ld1

ss1

nitem1

ncat1

cor1

0 20 60 100

r32

IncNodePurity

ld1

ncat1

ss1

nitem1

cor1

0.0 0.5 1.0 1.5 2.0

r33

IncNodePurity

259

Figure 4.8. Regression Tree to predict Lasso Logistic Regression Sensitivity

|ss1: 500

nitem1: 10

0.1420

0.1524 0.1849

260

Figure 4.9. Variable Importance Measures of Lasso Logistic Regression with Bayes

classifier. Note: Classification rate is on the left panel, sensitivity is on the center panel,

and specificity is on the right panel.

ld1

ss1

nitem1

0.000 0.002 0.004

r28

IncNodePurity

ld1

nitem1

ss1

0.00 0.05 0.10 0.15 0.20

r29

IncNodePurity

ld1

nitem1

ss1

0.0000 0.0010

r30

IncNodePurity

261

Figure 4.10. Regression Tree to predict Relaxed Lasso Sensitivity

|prev1: 0.05,1

prev1: 0.05

ss1: 500,1000 nitem1: 10

nitem1: 10

0.1238 0.1934 0.1884 0.2333

0.3492 0.3951

262

Figure 4.11. Regression Tree to predict Relaxed Lasso Specificity

|prev1: 2

ss1: 250 prev1: 0.050.9300 0.9397 0.9892 0.9775

263

Figure 4.12. Variable importance measures of Relaxed Lasso Logistic Regression with

Bayes Classifier. Note: Classification rate is on the left panel, sensitivity is on the center

panel, and specificity is on the right panel.

ld1

nitem1

ss1

prev1

0 2 4 6 8 10

r25

IncNodePurity

ld1

nitem1

ss1

prev1

0 10 20 30

r26

IncNodePurity

ld1

nitem1

ss1

prev1

0.0 0.5 1.0 1.5 2.0

r27

IncNodePurity

264

Figure 4.13. Regression Tree to predict Logistic Regression Classification Rates

|prev1: 2

cor1: 3,5ss1: 250

prev1: 0.05

ss1: 250 ss1: 2500.7628 0.7920 0.8134

0.9025 0.94340.8698 0.8950

265

Figure 4.14. Regression Tree to predict Logistic Regression Sensitivity

|cor1: 3,5

prev1: 0.05,1

ss1: 500,1000 cor1: 3

prev1: 0.05,1

ss1: 500,1000prev1: 0.05

0.02263 0.091670.06557 0.18970

0.12530 0.219400.26040

0.39700

266

Figure 4.15. Regression Tree to predict Logistic Regression Specificity

|ss1: 250

prev1: 2

cor1: 7

prev1: 2

cor1: 5,7

cor1: 5

cor1: 7

0.90530.9325 0.9493

0.9493 0.92810.9787

0.9773 0.9922

267

Figure 4.16. Variable importance measures of Logistic Regression with Bayes classifier.

Note: Classification rate is on the left panel, sensitivity is on the center panel, and

specificity is on the right panel.

ld1

ncat1

cor1

ss1

prev1

0 10 20 30 40

r22

IncNodePurity

ld1

ncat1

ss1

prev1

cor1

0 20 60 100

r23

IncNodePurity

ld1

ncat1

cor1

prev1

ss1

0 1 2 3 4 5 6

r24

IncNodePurity

268

Figure 4.17. Regression Tree to predict Sensitivity of the Random Forest model with a

ROC Classifier

|cor1: 3,5

ncat1: 2

nitem1: 10

prev1: 0.05

ss1: 500,1000 prev1: 1ss1: 1000

cor1: 3

nitem1: 10

ncat1: 2

prev1: 0.05

0.3029 0.48510.4110 0.5715

0.6288

0.6586

0.6337 0.6951

0.5172 0.6972

0.7732

0.7864

269

Figure 4.18 Variable Importance Measures of Random forest with ROC classifier. Note:


on the right panel.

ld1

ss1

prev1

ncat1

nitem1

cor1

0 50 100 200

r10

IncNodePurity

ld1

ss1

prev1

ncat1

nitem1

cor1

0 50 100 150

r11

IncNodePurity

ld1

ss1

prev1

ncat1

nitem1

cor1

0 50 150 250

r12

IncNodePurity

270

Figure 4.19. Regression Tree to predict Sensitivity for Logistic Regression with a ROC

Classifier

|cor1: 3

cor1: 5

ss1: 250

nitem1: 10prev1: 0.05

0.6021

0.6874

0.7741

0.5686 0.7491

0.7876

271

Figure 4.20. Variable Importance Measures for Logistic Regression with ROC classifier.

Note: Classification rate is on the left panel, sensitivity is on the center panel, and

specificity is on the right panel.

ld1

nitem1

ncat1

prev1

ss1

cor1

0 50 100 150 200

r13

IncNodePurity

ld1

ncat1

nitem1

prev1

ss1

cor1

0 50 100 150 200

r14

IncNodePurity

ld1

nitem1

ncat1

prev1

ss1

cor1

0 50 100 150 200

r15

IncNodePurity

272

Figure 4.21. Regression Tree to predict for classification rates for Lasso Logistic

Regression with ROC classifier

|ss1: 250,500

nitem1: 10 nitem1: 10

0.7137 0.7257 0.7264 0.7420

273

Figure 4.22. Regression True to predict Specificity for Lasso Logistic Regression with

ROC classifier

|prev1: 1,2

ss1: 250

nitem1: 10

0.7027

0.7103 0.7252

0.7305

274

Figure 4.23. Variable importance measure of Lasso Logistic Regression with ROC

classifier. Note: Classification rate is on the left panel, sensitivity is on the center panel,

and specificity is on the right panel.

ld1

ncat1

prev1

nitem1

ss1

0.0 0.1 0.2 0.3 0.4 0.5

r16

IncNodePurity

ld1

ncat1

nitem1

ss1

prev1

0.0 0.5 1.0 1.5 2.0

r17

IncNodePurity

ld1

ncat1

nitem1

ss1

prev1

0.0 0.2 0.4

r18

IncNodePurity

275

Figure 4.24. Regression Tree to predict Classification Rate of Relaxed Lasso with ROC

classifier

|nitem1: 10

ncat1: 2 prev1: 1,2

ss1: 250

0.7149 0.7320

0.7223 0.7372

0.7491

276

Figure 4.25. Regression Tree to predict Sensitivity of Relaxed Lasso with ROC Classifier

|prev1: 2

ss1: 250

prev1: 0.05

0.7605

0.7759

0.8062 0.7899

277

Figure 4.26. Regression Tree to predict Specificity of Relaxed Lasso with ROC classifier

|prev1: 1,2

nitem1: 10

0.7117 0.7252

0.7382

278

Figure 4.27. Variable Importance Measures of Relaxed Lasso with ROC Classifier. Note:


on the right panel.

ld1

ncat1

prev1

ss1

nitem1

0.0 0.1 0.2 0.3 0.4

r19

IncNodePurity

ld1

nitem1

ncat1

ss1

prev1

0.0 0.5 1.0 1.5 2.0

r20

IncNodePurity

ld1

ss1

ncat1

nitem1

prev1

0.0 0.2 0.4 0.6

r21

IncNodePurity

279

Figure 7.1. Regression Tree for Theta MSE of CART

|cor1: 3,5

nitem1: 10

ss1: 500 ss1: 500

0.005109

0.176000 0.114500 0.301100 0.239500

280

Figure 7.2. Regression Tree for the correlation between true and data-generating theta for

CART

|cor1: 7

nitem1: 10

ss1: 500 ncat1: 2

0.7855 0.8608 0.6783 0.7509

0.9945

281

Figure 7.3. Variable importance measures for CART person parameter recovery. Note:

theta MSE is on the left panel and correlation between true and estimated theta is on the

right panel.

ld1

prev1

ncat1

ss1

nitem1

cor1

0 50 100 150

r34

IncNodePurity

ld1

prev1

ncat1

ss1

nitem1

cor1

0 50 100 150 200

r35

IncNodePurity

282

Figure 7.4. Regression tree for theta MSE in the lasso logistic regression model

|ss1: 500,1000

prev1: 1,2

prev1: 1

ss1: 500

ncat1: 2

ncat1: 2

ss1: 500 ss1: 500

ncat1: 2

prev1: 2

nitem1: 10

prev1: 2

0.16640 0.112700.08576

0.066790.22830 0.17080 0.16380 0.10620

0.176600.21260 0.28340

0.12580 0.17890

283

Figure 7.5. Regression tree for correlation between estimated and data-generating theta

for lasso logistic regression

|ss1: 250

ncat1: 2

prev1: 0.05,1 prev1: 0.05,1

prev1: 0.05

ncat1: 2

ss1: 500 ss1: 500

prev1: 1

ncat1: 2

ss1: 500

0.7029 0.79120.8023 0.8626

0.7316 0.7977 0.8208 0.8836

0.8027 0.87530.9027

0.9237

284

Figure 7.6. Variable importance measures for lasso logistic regression person parameter

recovery. Note: theta MSE is on the left panel and correlation between true and estimated

theta is on the right panel.

ld1

nitem1

ncat1

prev1

ss1

0 2 4 6 8 10

r34

IncNodePurity

ld1

nitem1

ncat1

prev1

ss1

0 5 10 15

r35

IncNodePurity

285

Figure 7.7. Regression Tree for theta MSE in the random forest model

|nitem1: 10

ncat1: 2 ncat1: 2

0.13680 0.10330 0.08423 0.05164

286

Figure 7.8. Regression Tree for the correlation between estimated and data-generating

theta for random forest model

|nitem1: 10

ncat1: 2 ncat1: 2

0.8241 0.8813

0.9070 0.9458

287

Figure 7.9. Variable importance measures for random forest person parameter recovery.

Note: theta MSE is on the left panel and correlation between true and estimated theta is

on the right panel.

ld1

ss1

cor1

prev1

ncat1

nitem1

0 5 10 20 30

r34

IncNodePurity

ld1

ss1

cor1

prev1

ncat1

nitem1

0 10 30 50

r35

IncNodePurity

psychometric and machine learning approaches to … › attachments › 206489 › ...enough for...

Documents