psychometric and machine learning approaches to … › attachments › 206489 › ...enough for...
TRANSCRIPT
f
Psychometric and Machine Learning Approaches to Diagnostic Assessment
by
Oscar Gonzalez
A Dissertation Presented in Partial Fulfillment
of the Requirements for the Degree
Doctor of Philosophy
Approved May 2018 by the
Graduate Supervisory Committee:
Michael C. Edwards, Co-Chair
David P. MacKinnon, Co-Chair
Kevin J. Grimm
Yi Zheng
ARIZONA STATE UNIVERSITY
August 2018
i
ABSTRACT
The goal of diagnostic assessment is to discriminate between groups. In many
cases, a binary decision is made conditional on a cut score from a continuous scale.
Psychometric methods can improve assessment by modeling a latent variable using item
response theory (IRT), and IRT scores can subsequently be used to determine a cut score
using receiver operating characteristic (ROC) curves. Psychometric methods provide
reliable and interpretable scores, but the prediction of the diagnosis is not the primary
product of the measurement process. In contrast, machine learning methods, such as
regularization or binary recursive partitioning, can build a model from the assessment
items to predict the probability of diagnosis. Machine learning predicts the diagnosis
directly, but does not provide an inferential framework to explain why item responses are
related to the diagnosis. It remains unclear whether psychometric and machine learning
methods have comparable accuracy or if one method is preferable in some situations. In
this study, Monte Carlo simulation methods were used to compare psychometric and
machine learning methods on diagnostic classification accuracy. Results suggest that
classification accuracy of psychometric models depends on the diagnostic-test correlation
and prevalence of diagnosis. Also, machine learning methods that reduce prediction error
have inflated specificity and very low sensitivity compared to the data-generating model,
especially when prevalence is low. Finally, machine learning methods that use ROC
curves to determine probability thresholds have comparable classification accuracy to the
psychometric models as sample size, number of items, and number of item categories
increase. Therefore, results suggest that machine learning models could provide a viable
ii
alternative for classification in diagnostic assessments. Strengths and limitations for each
of the methods are discussed, and future directions are considered.
iii
DEDICATION
This document is dedicated to my parents, who gave me the trust and the confidence to
pursue a college career. This document is also dedicated to my amazing friends Amber,
James, Matt, and Peter, who also finished graduate degrees during 2018. Thank you all!
Para mis padres, que me dieron todo su apoyo y confianza para conseguir una carrera
universitaria. Para mis mejores amigos, Amber, James, Matt, and Peter quienes también
terminaron posgrados en este año. Gracias a todos!
Small victories, be humble, move forward.
iv
ACKNOWLEDGMENTS
I thank my committee members, Dr. David P. MacKinnon, Dr. Michael C. Edwards, Dr.
Yi Zheng, and Dr. Kevin J. Grimm for their help in the development of this document.
Their comments were challenging and made the project better. Dave, I cannot thank you
enough for saving my career after losing Roger. Your endless approach to learning
inspired me to pursue an academic career. Mike, thank you for your support on my
psychometric interests, professional guidance, and mentorship. Although I am often
indecisive, you have taught me to keep in mind my life priorities and seek out new
opportunities. Kevin, thank you for our endless conversations, both personal and
professional. Your advice shaped my future behavior as a faculty member and my
approach to academics. Yi, thank you for all of your insight about my projects and
professional advice. Thank you for making sure I push my limits on my psychometrics
knowledge and consider applications of my work outside of psychology. Finally, my
research for this dissertation (and during most of my graduate studies) was supported by
the National Science Foundation Graduate Research Fellowship under Grant No. DGE-
1311230.
v
TABLE OF CONTENTS
Page
LIST OF TABLES ........................................................................................................... vii
LIST OF FIGURES ........................................................................................................ xvi
1 INTRODUCTION ...........................................................................................................1
Diagnostic Assessment and Classification Accuracy .............................................3
Psychometric Methods for Diagnostic Assessment ................................................7
Machine Learning Approaches to Diagnostic Assessment ...................................18
Simulation Study ...................................................................................................26
2 METHOD .......................................................................................................................28
Data-Generation ....................................................................................................28
Data Analysis ........................................................................................................30
3 RESULTS .......................................................................................................................36
Estimation of the Psychometric Models ................................................................37
Classification Accuracy of the Psychometric Models ...........................................37
Estimation of the Machine Learning Models ........................................................40
Classification Accuracy of Machine Learning Models with ROC Classifiers .....41
Comparing Psychometric and Machine Learning Models ....................................44
Scoring Machine Learning Items for Person Parameter Recovery .......................48
4 DISCUSSION .................................................................................................................50
Revisiting Study Hypotheses ................................................................................51
Limitations and Future Directions ........................................................................55
Conclusion .............................................................................................................59
vi
Page
REFERENCES .................................................................................................................61
APPENDIX
A. FLOWCHART OF THE DATA-GENERATING PROCEDURE ..................64
B. TABLES ...........................................................................................................66
C. FIGURES .........................................................................................................68
D. R SYNTAX TO GENERATE DATA .............................................................71
E. R SYNTAX TO ANALYZE DATA ................................................................77
F. SUPPLEMENTAL WRITE-UP OF SIMULATION RESULTS .....................85
vii
LIST OF TABLES
Table Page
1. Median of classification accuracy indices by model ..................................................67
1.1. IRT Nonconvergence (out of 500) in the 10-item condition ............................... 135
1.2. IRT Nonconvergence (out of 500) in the 30-item condition ................................136
1.3. Mean Squared Error for the Theta Estimate in the 10-item condition ...................137
1.4. Mean Squared Error for the Theta Estimate in the 30-item condition ..................138
1.5. Correlation between True and Theta Estimate in the 10-item condition ..............139
1.6. Correlation between the True and Theta Estimate in the 30-item condition ........140
1.7. Mean Squared Error for the Slopes in the 2PL Model ..........................................141
1.8. Variance Explained for the Slopes in the 2PL Model ...........................................142
1.9. Mean Squared Error for the Threshold Parameter in the 2PL Model ...................143
1.10. Variance Explained for the Threshold Parameter in the 2PL Model ..................144
1.11. Mean Squared Error for the Slopes in the Graded Response Model ..................145
1.12. Variance Explained for the Slopes in the Graded Response Model ...................146
1.13. Mean Square Error for the First Threshold in the Graded Response Model ......147
1.14. Mean Square Error for the Second Threshold in the Graded Response Model ...148
1.15. Mean Square Error for the Third Threshold in the Graded Response Model ......149
1.16. Mean Square Error for the Fourth Threshold in the Graded Response Model ....150
1.17. Correlation between True and Estimated First Threshold in the Graded Response
Model .................................................................................................................151
1.18. Correlation between True and Estimated Second Threshold in the Graded
Response Model ................................................................................................152
viii
Table Page
1.19. Correlation between True and Estimated Third Threshold in the Graded Response
Model .................................................................................................................153
1.20. Correlation between True and Estimated Fourth Threshold in the Graded
Response Model ................................................................................................154
2.1. Classification Rate of Data-generating Theta in Conditions with 10 items ......... 155
2.2. Classification Rate of Data-generating Theta in Conditions with 30 items ..........156
2.3. Sensitivity Rate of Data-generating Theta in Conditions with 10 items .............. 157
2.4. Sensitivity Rate of Data-generating Theta in Conditions with 30 items .............. 158
2.5. Specificity of Data-generating Theta in Conditions with 10 items....................... 159
2.6. Specificity of Data-generating Theta in Conditions with 30 items....................... 160
2.7. Classification Rate of Estimated Thetas in Conditions with 10 items .................. 161
2.8. Classification Rate of Estimated Thetas in Conditions with 30 items .................. 162
2.9. Sensitivity of Estimated Thetas in Conditions with 10 items ............................... 163
2.10. Sensitivity of Estimated Thetas in Conditions with 30 items ............................. 164
2.11. Specificity of Estimated Thetas in Conditions with 10 items ............................. 165
2.12. Specificity of Estimated Thetas in Conditions with 30 items ............................. 166
2.13. Classification Rate of Raw Summed Score in Conditions with 10 items .......... 167
2.14. Classification Rate of Raw Summed Score in Conditions with 30 items ...........168
2.15. Sensitivity of Raw Summed Score in Conditions with 10 items ........................ 169
2.16. Sensitivity of Raw Summed Score in Conditions with 30 items ........................ 170
2.17. Specificity of Raw Summed Score in Conditions with 10 items ........................ 171
2.18. Specificity of Raw Summed Score in Conditions with 30 items ........................ 172
ix
Table Page
3.1. Proportion of the CART Models with a Bayes Classifier that did not Assign Cases
to the Minority Class in Conditions with 10 items ............................................173
3.2. Proportion of the CART Models with a Bayes Classifier that did not Assign Cases
to the Minority Class in Conditions with 30 items .............................................174
3.3. Proportion of the Random Forest Models with a Bayes Classifier that did not
Assign Cases to the Minority Class in Conditions with 10 items ......................175
3.4. Proportion of the Random Forest Models with a Bayes Classifier that did not
Assign Cases to the Minority Class in Conditions with 30 items ......................176
3.5. Proportion of the Lasso Models with a Bayes Classifier that did not Assign Cases
to the Minority Class in Conditions with 10 items .............................................177
3.6. Proportion of the Lasso Models with a Bayes Classifier that did not Assign Cases
to the Minority Class in Conditions with 30 items .............................................178
3.7. Proportion of the Relaxed Lasso Models with a Bayes Classifier that did not
Assign Cases to the Minority Class in Conditions with 10 items ......................179
3.8. Proportion of the Relaxed Lasso Models with a Bayes Classifier that did not
Assign Cases to the Minority Class in Conditions with 30 items ......................180
3.9. Proportion of the Logistic Models with a Bayes Classifier that did not Assign
Cases to the Minority Class in Conditions with 10 items ..................................181
3.10. Proportion of the Logistic Models with a Bayes Classifier that did not Assign
Cases to the Minority Class in Conditions with 30 items ..................................182
3.11. Proportion of the Lasso Models with a ROC Classifier that did not Assign Cases
to the Minority Class in Conditions with 10 items .............................................183
x
Table Page
3.12. Proportion of the Lasso Models with a ROC Classifier that did not Assign Cases
to the Minority Class in Conditions with 30 items ........................................... 184
4.1. Classification Accuracy of CART for Models with 20% Prevalence and greater
than N=250 ....................................................................................................... 185
4.2. Classification Rate of Random Forest with a Bayes Classifier in Conditions with
Prevalence of .20 .............................................................................................. 186
4.3. Sensitivity of Random Forest with a Bayes Classifier in Conditions with
Prevalence of .20 ............................................................................................... 187
4.4. Specificity of Random Forest with a Bayes Classifier in Conditions with
Prevalence of .20 ............................................................................................... 188
4.4A. Classification Accuracy of Lasso Logistic Regression with a Bayes Classifier for
Models with Prevalence of .20, Diagnosis-test Correlation of .7, Five-category
Items, and Sample Size greater than N=250 ..................................................... 189
4.5. Classification Rate of Relaxed Lasso Logistic Regression with a Bayes Classifier
for conditions with Five-category Items and a Diagnosis-test Correlation
of .70 ..................................................................................................................189
4.6. Sensitivity of Relaxed Lasso Logistic Regression with a Bayes Classifier for
conditions with Five-category Items and a Diagnosis-test Correlation of .70 ...190
4.7. Specificity of Relaxed Lasso Logistic Regression with a Bayes Classifier for
conditions with Five-category Items and a Diagnosis-test Correlation of .70 ...191
4.8. Classification Rate of Logistic Regression with a Bayes Classifier in Conditions
with 30 items ......................................................................................................192
xi
Table Page
4.9. Sensitivity of Logistic Regression with a Bayes Classifier in Conditions with 30
items .................................................................................................................. 193
4.10. Specificity of Logistic Regression with a Bayes Classifier in Conditions with 30
items .................................................................................................................. 194
4.11. Classification Rate of Random Forest with a ROC Classifier in Conditions with
10 items ............................................................................................................. 195
4.12. Classification Rate of Random Forest with a ROC Classifier in Conditions with
30 items ............................................................................................................. 196
4.13. Sensitivity of Random Forest with a ROC Classifier in Conditions with
10 items ............................................................................................................ 197
4.14. Sensitivity of Random Forest with a ROC Classifier in Conditions with
30 items ............................................................................................................ 198
4.15. Specificity of Random Forest with a ROC Classifier in Conditions with
10 items ............................................................................................................ 199
4.16. Specificity of Random Forest with a ROC Classifier in Conditions with
30 items ............................................................................................................ 200
4.17. Classification Rate of Random Forest with a ROC Classifier in Conditions with
10 items ............................................................................................................. 201
4.18. Classification Rate of Logistic Regression with a ROC Classifier in Conditions
with 30 items ..................................................................................................... 202
4.19. Sensitivity of Logistic Regression with a ROC Classifier in Conditions with 10
items .................................................................................................................. 203
xii
Table Page
4.20. Sensitivity of Logistic Regression with a ROC Classifier in Conditions with 30
items .................................................................................................................. 204
4.21. Specificity of Logistic Regression with a ROC Classifier in Conditions with 10
items .................................................................................................................. 205
4.22. Specificity of Logistic Regression with a ROC Classifier in Conditions with 30
items .................................................................................................................. 206
4.23. Classification Rate of Lasso Logistic Regression with a ROC Classifier in
Conditions with Diagnosis-test Correlation of .70 ........................................... 207
4.24. Sensitivity of Lasso Logistic Regression with a ROC Classifier in Conditions with
Diagnosis-test Correlation of .70 ...................................................................... 208
4.25. Specificity of Lasso Logistic Regression with a ROC Classifier in Conditions
with Diagnosis-test Correlation of .70 ............................................................. 209
4.26. Classification Rate of Relaxed Lasso with a ROC Classifier in Conditions with
Diagnosis-test Correlation of .70 ...................................................................... 210
4.27. Sensitivity of Relaxed Lasso with a ROC Classifier in Conditions with Diagnosis-
test Correlation of .70 ....................................................................................... 211
4.28. Specificity of Relaxed Lasso with a ROC Classifier in Conditions with Diagnosis-
test Correlation of .70 ....................................................................................... 212
6.1. Classification Rate Differences between Random Forest with ROC Classifier and
Data-generating Theta in Conditions with 10 items .......................................... 213
6.2. Classification Rate Differences between Random Forest with ROC Classifier and
Data-generating Theta in Conditions with 30 items ......................................... 214
xiii
Table Page
6.3. Sensitivity Differences between Random Forest with ROC Classifier and Data-
generating Theta in Conditions with 10 items ................................................... 215
6.4. Sensitivity Differences between Random Forest with ROC Classifier and Data-
generating Theta in Conditions with 30 items ................................................... 216
6.5. Specificity Differences between Random Forest with ROC Classifier and Data-
generating Theta in Conditions with 10 items ................................................... 217
6.6. Specificity Differences Between Random Forest with ROC Classifier and Data-
generating Theta in Conditions with 30 items ................................................... 218
6.7. Classification Rate Differences between Logistic Regression with ROC Classifier
and Data-generating Theta in Conditions with 10 items ................................... 219
6.8. Classification Rate Differences between Logistic Regression with ROC Classifier
and Data-generating Theta in Conditions with 30 items ................................... 220
6.9. Sensitivity Differences between Logistic Regression with ROC Classifier and
Data-generating Theta in Conditions with 10 items .......................................... 221
6.10. Sensitivity Differences between Logistic Regression with ROC Classifier and
Data-generating Theta in Conditions with 30 items .......................................... 222
6.11. Specificity Differences between Logistic Regression with ROC Classifier and
Data-generating Theta in Conditions with 10 items .......................................... 223
6.12. Specificity Differences between Logistic Regression with ROC Classifier and
Data-generating Theta in Conditions with 30 items .......................................... 224
6.13. Classification Rate Difference between Lasso with a ROC Classifier and Data-
generating Theta in Conditions with Diagnostic-test Correlation of .70 ......... 225
xiv
Table Page
6.14. Sensitivity Difference between Lasso with a ROC Classifier and Data-generating
Theta in Conditions with Diagnostic-test Correlation of .70 ........................... 226
6.15. Specificity Difference between Lasso with a ROC Classifier and Data-generating
Theta in Conditions with Diagnostic-test Correlation of .70 ............................227
6.16. Classification Rate Difference between Relaxed Lasso with a ROC Classifier and
Data-generating Theta in Conditions with Diagnostic-test Correlation
of .70 ................................................................................................................. 228
6.17. Sensitivity Difference between Relaxed Lasso with a ROC Classifier and Data-
generating Theta in Conditions with Diagnostic-test Correlation of .70 ......... 229
6.18. Specificity Difference between Relaxed Lasso with a ROC Classifier and Data-
generating Theta in Conditions with Diagnostic-test Correlation of .70 ......... 230
7.1. Number of Replications in Conditions with Binary Items where CART did not
Choose at least Two Items (where Theta Score was not Estimated) ................ 231
7.2. Number of Replications in Conditions with Polytomous Items where CART did not
Choose at least Two Items (where Theta Score was not Estimated) ................ 232
7.3. Correlation between True and Estimated Theta from CART in Conditions with
Binary Items ...................................................................................................... 233
7.4. Correlation between True and Estimated Theta from CART in Conditions with
Polytomous Items ............................................................................................. 234
7.5. Mean Squared Error of Estimated Theta from CART in Conditions with Binary
Items ................................................................................................................. 235
xv
Table Page
7.6. Mean Squared Error of Estimated Theta from CART in Conditions with Binary
Items ................................................................................................................. 236
7.7. Number of Replications in Conditions with Binary Items where Lasso did not
Choose at least Two Items (where Theta Score was not Estimated) ................ 237
7.8. Number of Replications in Conditions with Polytomous Items where Lasso did not
Choose at least Two Items (where Theta Score was not Estimated) ................ 238
7.9. Correlation between True and Estimated Theta from Lasso in Conditions with
Diagnosis-test Correlation of .70 ...................................................................... 239
7.10. Mean Squared Error of Estimated Theta from Lasso in Conditions with
Diagnosis-test Correlation of .70 ...................................................................... 240
7.11. Mean Squared Error of Estimated Theta from Random Forest in Conditions with
Binary Items ..................................................................................................... 241
7.12. Mean Squared Error of Estimated Theta from Random Forest in Conditions with
Polytomous Items ..............................................................................................242
7.13. Correlation between True and Estimated Theta from Random Forest in Conditions
with Binary Items .............................................................................................. 243
7.14. Correlation between True and Estimated Theta from Random Forest in Conditions
with Polytomous Items ...................................................................................... 244
7.A. Average Number of Items Chosen by CART in Conditions with 10 items ........ 245
7.B. Average Number of Items Chosen by CART in Conditions with 30 items ........ 246
7.C. Average Number of Items Chosen by Lasso in Conditions with 10 items ......... 247
7.D. Average Number of Items Chosen by Lasso in Conditions with 30 items ......... 248
xvi
LIST OF FIGURES
Figure Page
1. Variance explained and unconditional η2 effect sizes for the predictors of
classification accuracy in machine learning models with ROC classifiers .........69
2. Random Forest variable importance measures for machine learning algorithms with
ROC classifiers .....................................................................................................70
2.1. Variable Importance Measures for Data-generating Model ................................ 249
2.2. Variable Importance Measures of the Raw Summed Scores ............................... 250
2.3. Variable Importance Measures of Estimated Theta ............................................. 251
4.1. Regression Tree to predict CART Classification Rate ......................................... 252
4.2. Regression Tree to predict CART sensitivity ...................................................... 253
4.3. Regression Tree to predict CART specificity ...................................................... 254
4.4. Variable Importance Measures of CART ............................................................. 255
4.5. Regression Tree to predict Random Forest Classification Rate .......................... 256
4.6. Regression Tree to predict Random Forest Specificity ....................................... 257
4.7. Variable Importance Measures of Random Forest with Bayes Classifier ........... 258
4.8. Regression Tree to predict Lasso Logistic Regression Sensitivity ...................... 259
4.9. Variable Importance Measures of Lasso Logistic Regression with Bayes
Classifier ........................................................................................................... 260
4.10. Regression Tree to predict Relaxed Lasso Sensitivity ........................................ 261
4.11. Regression Tree to predict Relaxed Lasso Specificity ....................................... 262
4.12. Variable Importance Measures of Relaxed Lasso Logistic Regression with Bayes
Classifier ........................................................................................................... 263
xvii
Figure Page
4.13. Regression Tree to predict Logistic Regression Classification Rates ............... 264
4.14. Regression Tree to predict Logistic Regression Sensitivity .............................. 265
4.15. Regression Tree to predict Logistic Regression Specificity ............................... 266
4.16. Variable Importance Measures of Logistic Regression with Bayes Classifier .. 267
4.17. Regression Tree to predict Sensitivity of Random Forest with a ROC
Classifier ........................................................................................................... 268
4.18. Variable Importance Measures of Random Forest with ROC Classifier ........... 269
4.19. Regression Tree to predict Sensitivity for Logistic Regression with a ROC
Classifier ........................................................................................................... 270
4.20. Variable Importance Measures for Logistic Regression with ROC Classifier .. 271
4.21. Regression Tree to predict Classification Rates for Lasso Logistic Regression
with ROC Classifier ........................................................................................ 272
4.22. Regression Tree to predict Specificity for Lasso Logistic Regression with ROC
Classifier ........................................................................................................... 273
4.23. Variable Importance Measures of Lasso Logistic Regression with ROC
Classifier ........................................................................................................... 274
4.24. Regression Tree to predict Classification Rate of Relaxed Lasso with ROC
Classifier ........................................................................................................... 275
4.25. Regression Tree to predict Sensitivity of Relaxed Lasso with ROC Classifier ...276
4.26. Regression Tree to predict Specificity of Relaxed Lasso with ROC Classifier ..277
4.27. Variable Importance Measures of Relaxed Lasso with ROC Classifier ............ 278
7.1. Regression Tree for Theta MSE of CART .......................................................... 279
xviii
Figure Page
7.2. Regression Tree for the Correlation between True and Data-generating Theta for
CART ............................................................................................................... 280
7.3. Variable Importance Measures for CART Person Parameter Recovery ............. 281
7.4. Regression Tree for Theta MSE in the Lasso Logistic Regression Model .......... 282
7.5. Regression Tree for the Correlation between Estimated and Data-generating Theta
for Lasso Logistic Regression .......................................................................... 283
7.6. Variable Importance Measures for the Lasso Logistic Regression Person Parameter
Recovery ........................................................................................................... 284
7.7. Regression Tree for Theta MSE in the Random Forest Model ........................... 285
7.8. Regression Tree for the Correlation between Estimated and True Theta for Random
Forest Model ..................................................................................................... 286
7.9. Variable Importance Measures for the Random Forest person parameter
recovery ............................................................................................................ 287
1
1. Introduction
An important property of psychological assessments and medical instruments is
the ability to accurately screen or diagnose individuals for disorders or illnesses. For
example, a psychiatrist might assess if a patient has major depressive disorder, or a
physician might diagnose a patient with cancer, or a psychologist might evaluate if a
teenager is at risk of suicide. Other examples include the Child Behavior Checklist
(CBCL), a measure of emotional, behavior and social problems in children, used to
screen children for autism spectrum disorders (Achenbach & Rescorla, 2013); the Center
for Epidemiologic Studies Depression (CES-D) Scale, a measure of depressive
symptoms, used to screen for depression in older adults (Lewinsohn, Seeley, Roberts, &
Allen, 1997); and the Parent General Behavior Inventory (P-GBI), a behavior inventory,
used to screen children for pediatric bipolar disorder (Youngstrom, Frazier, Demeter,
Calabrese, & Findling, 2009). In all of these situations, the assessment reduces to making
a binary decision, such as deciding to follow-up with a patient (or not), or to make a
diagnosis (or not; Liu, 2012). Therefore, the assessment needs to maximize the
proportion of patients correctly identified by the assessment as having the diagnosis
(known as sensitivity), and the proportion of patients correctly identified by the
assessment as not having the diagnosis (known as specificity). There are approaches in
both psychometrics and machine learning to maximize diagnostic classification accuracy
(Gibbons, Weiss, Franke, & Kupfer, 2016; James et al., 2013; Liu, 2012; Youngstrom,
2013; Zweig & Campbell, 1993). However, it is less clear: (1) which approach to use and
(2) how the approaches compare to one another in the same context. A simulation study
2
could provide relative advantages and disadvantages of using psychometric and machine
learning methods for diagnostic classification.
The proposed simulation investigates the performance of determining a diagnosis
from an assessment using psychometric and machine learning approaches. The
psychometric methods used in this study aggregate item information (as summed scores
or item response theory scale scores), and then a cut score is determined by using receiver
operating characteristic (ROC) curves. In contrast, machine learning methods treat
diagnostic classification as a prediction problem, where the predictors of the diagnosis
are the responses to the items comprising the assessment. The machine learning methods
then assign each case either to the class that is most probable (Bayes classifier) or to a
class conditional on a probability threshold determined by ROC curves (ROC classifier).
The proposed simulation consists of two studies. The first study compares
classification accuracy using the psychometric and machine learning methods. The main
outcomes of the first part of the study are the difference in classification accuracy,
sensitivity, and specificity across the methods. The goal of the first study is to understand
how classification performance across the methods is influenced by sample size,
prevalence of the diagnosis, the diagnosis-test correlation, and assessment structure. The
second study investigates the variable selection property of machine learning models and
if items selected for prediction could recover the latent variable score on the assessment.
The goal of the second study is to compare the correlation between the IRT scores
estimated by the items selected by machine learning methods to the data-generating IRT
scores. In this case, ceiling and floor effects are of interest.
3
This dissertation has six parts. First, basic diagnostic classification studies and
classification accuracy measures are introduced. Second, psychometric approaches to
classification using IRT and ROC curve analysis are described. Third, machine learning
approaches to classification using Lasso logistic regression, classification and regression
trees, and the random forest algorithm are described. Fourth, study goals and the
simulation methods are discussed, Fifth, Monte Carlo simulation results are presented.
Finally, the strengths and limitations of psychometric and machine learning models are
discussed, and future directions are considered.
1.1 Diagnostic Assessment and Classification Accuracy
In the simplest case, a diagnosis is a binary outcome, and the diagnostic
assessment determines if the patient is likely to be diagnosed or not diagnosed. Typically,
the diagnostic assessment can be composed of a series of items, and a cut score has to be
determined so that clinicians or researchers determine the likelihood of a person having
the diagnosis. Diagnostic assessments are imperfect, but they correlate with a gold
standard that dictates the diagnosis. Gold standards tend to be costly, extensive, invasive,
or all of these, so administering a gold standard may be inconvenient. Diagnostic
accuracy studies are research studies that investigate how well diagnostic assessments
discriminate between patients with and without a condition (Zhou, Obuchowski, &
McClish, 2011; Zweig & Campbell, 1993). Diagnostic accuracy studies are usually
composed of three parts: (1) a sample of participants who take the diagnostic assessment,
(2) an approach to score the diagnostic assessment, and (3) a gold standard, independent
of the diagnostic assessment, that indicates the true condition (Zhou et al., 2011). The
proposed simulation study mimics the scenario of a diagnostic accuracy study.
4
For example, consider a diagnostic accuracy study on how the Revised Hamilton
Scale for Depression (HRSD-R) discriminates between clinically depressed and non-
depressed participants (McFall & Treat, 1999). The gold standard latent depression
scores 𝜃𝐷𝑖𝑎𝑔 (clinical interview or a more thorough assessment) are presented in the
horizontal axis of Figure A. A cut score on 𝜃𝐷𝑖𝑎𝑔 (dashed vertical line in Figure A) has
been determined to classify participants as depressed (right of the vertical dashed line) or
not depressed (left of the vertical dashed line). Suppose that 𝜃𝑑𝑖𝑎𝑔 correlates highly, but
not perfectly, with scores from the cheaper and less invasive HRSD-R. The vertical axis
represents the estimated scores of the HRSD-R, 𝜃𝐴𝑠𝑠𝑚𝑡 . Researchers would then
investigate the optimal HRSD-R cut score (dashed horizontal line in Figure A) to
maximize correct classification of participants as depressed or not depressed. The HRSD-
R is not perfect, so it may misclassify participants as depressed or not depressed. For
example, decreasing the HRSD-R cut score (sliding down the horizontal dashed line in
Figure A) would correctly identify most participant who are depressed (high sensitivity),
but would also identify many non-clinically depressed participants as depressed (low
specificity). On the other hand, increasing the HRSD-R cut score (sliding up the
horizontal dashed line in Figure A) would correctly rule out a depression diagnosis for
Figure A. Relationship between diagnosis and the assessment
FP
FN
TP
TN
5
most non-depressed participants (high specificity), but would miss many participants who
are depressed (low sensitivity). A common strategy is to study the trade-off between
correctly classifying participants as depressed or not (using ROC curves, introduced
later), but determining a cut score is a subjective decision. It is important to note that the
goal of the HRSD-R is not meant to replace the gold standard, but to provide an idea of
who should be followed-up in primary care settings or in situations where screening
needs to be done for a large number of people.
1.1.1 Classification Accuracy
A confusion table describes the relationship between the diagnosis by the gold
standard and predicted diagnosis by the assessment. Below is the general form of a
confusion table (Table A) in terms of diagnosed and not diagnosed, but the binary
variable can take any class or label.
Table A. Confusion table for diagnostic classification
Predicted
Not Diagnosed
Predicted
Diagnosed
Prediction Error
True Not Diagnosed A B B/(A+B)
True Diagnosed C D C/(C+D)
Use Error C/(A+C) B/(B+D) Error = 𝐵+𝐶
𝐴+𝐵+𝐶+𝐷
Using Table A and Figure A, D is the number of diagnosed patients that were correctly
predicted by the assessment (true positives; TP). A is the number of non-diagnosed
patients that were correctly predicted by the assessment (true negatives; TN). B is the
number of non-diagnosed patients that were incorrectly predicted by the assessment
(false positives; FP). Finally, C is the number of diagnosed patients that were incorrectly
predicted by the assessment (false negatives; FN). The main diagonal of Table A contains
6
the number of cases correctly predicted by the assessment, and the off-diagonal of Table
A contains the number of cases incorrectly predicted by the assessment.
Below are some statistics from the confusion table that estimate model accuracy
and that are used in this study. Apparent prevalence AP is the proportion of diagnosed
participants in the sample, 𝐵+𝐷
𝐴+𝐵+𝐶+𝐷. Sensitivity (Se) is the proportion of diagnosed
participants who were correctly predicted as diagnosed by the assessment, D/(C+D), and
specificity (Sp) is the proportion of non-diagnosed participants who were correctly
predicted as non-diagnosed by the assessment, A/(A+B). The complement of sensitivity
is the false negative rate (FNR), which is the proportion of diagnosed participants who
were predicted as non-diagnosed by the assessment, C/(C+D) or 1-Se. The complement
of specificity is the false positive rate (FPR), which is the proportion of non-diagnosed
participants who were predicted as diagnosed by the assessment, B/(A+B) or 1-Sp. With
estimates of Se and Sp, one can estimate true prevalence in the population P, 𝐴𝑃+𝑆𝑝−1
𝑆𝑝+𝑆𝑒+1 . Se
and Sp are independent of the prevalence. All of these statistics vary as the assessment
cut score changes. Finally, the probability of a correct result is the probability of getting
a true positive or a true negative, 𝐴+𝐷
𝐴+𝐵+𝐶+𝐷. As a complement, the misclassification rate is
the probability of getting a false positive or false negative, 𝐵+𝐶
𝐴+𝐵+𝐶+𝐷. Both of these
statistics depend on prevalence P. Two limitations of the probability of a correct result
are that: 1) false positives and false negatives are weighted the same and 2) assessments
with different sensitivities and specificities can give the same estimate, preventing direct
comparisons.
1.1.2 Some Clarifications
7
In this study, the term theta or θ refers to an IRT score. Also, the term diagnostic
assessment refers to making a binary decision about a diagnosis, and false positives and
false negatives are weighted the same. In other research, diagnostic assessment refers
only to assessments that focus on high specificity (Gibbons et al., 2013). This study also
focuses on methods for binary diagnoses, although there are models for diagnoses with
more categories or severities (Zhou et al., 2011). Finally, this study assumes that the gold
standard is reliable, unbiased, and that it yields the true diagnosis. However, see Zhou et
al. (2011) for cases when there are problems with the gold standard.
1.2. Psychometric Methods for Diagnostic Assessment
Psychometric methods for diagnostic assessment focus on improving the
measurement of the construct targeted in the assessment so that precise latent variable
scores can be estimated and misclassifications decrease. Typically, diagnostic assessment
would estimate a sum score and conduct ROC curve analysis to find a cut score on the
assessment. A modern psychometric approach for diagnostic assessment would consist
of: 1) fitting an IRT model to the assessment items, 2) estimating IRT scores, and 3)
conducting an ROC curve analysis to study how sensitivity and specificity change as a
function of shifting the assessment cut score. This section introduces IRT and ROC curve
analysis.
1.2.1 Item Response Theory
Item response theory (IRT) refers to a family of mathematical models used to
analyze item responses measuring a latent construct. IRT underlies many of the large-
scale test developments and scale construction (Thissen & Wainer, 2001). In contrast to
classical test theory, the most important properties of IRT are that item properties and
8
person scores on the intended construct are on the same scale, and that assessment takers
do not need to take the same items to have comparable scores.
IRT Models. A fundamental concept of IRT is the item characteristic curve (ICC)
or trace line. The trace line describes the probability of endorsing an item, or getting an
item correct, as a function of a participant’s level on the latent construct (θ) and
properties of the items. The two IRT models considered in this study are models for
dichotomous responses and polytomous (ordered-categorical) responses. The two-
parameter logistic model (2PL; Birnbaum, 1968) describes the trace lines for
dichotomous items using a logistic (S-shaped) curve with two item parameters,
𝑃(𝑢𝑖 = 1|𝜃, 𝑎𝑖, 𝑏𝑖) =1
1 + exp [−𝑎𝑖(𝜃 − 𝑏𝑖)] (2.1)
where θ is the latent variable score, 𝑎𝑖 is the discrimination parameter for item i, and 𝑏𝑖 is
the severity parameter for item i. The θ is a person parameter because it describes where
the participant stands on the intended construct, and a and b are item parameters because
they describe item properties. The severity parameter b is the item location along the
range of θ where a person has a .50 probability of endorsing the item (see Figure B). The
b parameters typically vary from -2 to 2 and are on the same scale as θ. Items with lower
b values are easier to endorse than items with higher b values. The discrimination
parameter a is proportional to the slope of the curve at the value of the severity
parameter. Items with higher a values can separate participants close to the b value better
than items with lower a values. The a parameters usually vary between 0 and 3. A
negative discrimination value indicates that the item has a different valence than the
majority of the items and it might need to be reverse-coded.
9
On the other hand, the graded response model (GRM; Samejima, 1969) is used to
describe trace lines for polytomous items,
𝑃(𝑢𝑖 = 𝑘|𝜃, 𝑏𝑖𝑘, 𝑎𝑖) =1
1 + exp[−𝑎𝑖(𝜃 − 𝑏𝑖𝑘)]−
1
1 + exp[−𝑎𝑖(𝜃 − 𝑏𝑖𝑘+1)] . (2.2)
The GRM uses cumulative probability differences to obtain the probability of answering
category k on the item. Specifically, the GRM compares the probability of responding up
to k category and the probability of responding to successive k+1 categories. For
example, if there are four k responses {0, 1, 2, 3}, comparisons are {0} vs. {1,2,3}, {0,1}
vs. {2,3}, and {0,1,2} vs. {3}. So, there are k-1 b-parameters for an item with k
categories. By definition, the probability of responding to the lowest category {0} or
higher is 1, and the probability of responding higher than the highest category {3} is 0.
Therefore, taking the difference between the probability of answering up to k and k+1
provides an estimate of the probability of responding k to the item. In this case, a is the
discrimination parameter, and b is the category boundary location and is related to the
level of θ where answering k is most likely (Edelen & Reeve, 2007).
The IRT models just described assume unidimensionality, where a single latent
variable explains the relationships among the items. Closely related to unidimensionality,
Figure B. IRT trace line
10
IRT models also assume local independence, where the joint probability of endorsing two
items is the product of the probabilities of endorsing each of the items (or, that items are
not related to each other conditional on θ). Violations of local independence (or local
dependence, LD) could be due to unmodeled dimensionality or content/location
similarity between items. Violations of unidimensionality have been previously addressed
using either bifactor or multidimensional item response theory (MIRT) models (Reckase,
2009). The impact of LD can be investigated by either analyzing if the model is
essentially unidimensional using measures of factor strength or local dependence
statistics. The rest of the study assumes that the assessment is unidimensional and, in
most conditions, that the items are locally independent.
Information from IRT Models. Item information functions (IIFs) are derived
from item parameters to indicate where each item discriminates the most across the range
of θ, or where each item can be most useful in estimating the θ score precisely (see
Figure C). The IIF’s are additive, so all of the items in the measure can define a test
information function (TIF) to describe where the measure discriminates the most across
the range of θ. For example, an assessment can provide more precise θ scores for
participants close to one standard deviation above the θ mean than in any other range.
Thus, IIFs and TIFs are used in scale development to select items so that assessments
have specific measurement properties.
11
IRT Calibration. In this study, marginal maximum likelihood (MML) with the
EM algorithm (Bock & Aitkin, 1981) is used to estimate item parameters for
dichotomous and polytomous IRT models. The item parameter estimation step is known
as calibration. The probability of answering response vector x of size I for person n is,
𝑝(𝒙𝒏|𝜃𝑛, 𝛽) = ∏ Pr (Xni = 𝑥𝑛𝑖|𝜃𝑛, 𝛽𝑖)
𝐼
𝑖=1
(2.5)
In this case, Xni is a response for person n on item i, β is a vector of item parameters of
the IRT model, and θn is the person parameter. Therefore, the probability of answering
each response vector depends on item parameters and person parameter. The likelihood
function is,
𝐿(𝛽, 𝜃: 𝒙) = ∏ 𝑝(𝒙𝒏|𝜃𝑛, 𝛽)
𝑁
𝑛=1
(2.6)
There are consistent estimation problems when maximizing the likelihood function in Eq.
2.6 because the estimated θ parameters increase as sample size increases. To circumvent
this problem, MML-EM assumes that item parameters are fixed and that persons are
randomly sampled from a population. Therefore, the θ parameters are treated as random
effects and are integrated (marginalized) over during estimation (de Ayala, 2009).
Formally,
Figure C. IRT trace line and information function
12
𝑝(𝒙|𝛽) = ∫ 𝑝(𝑥|𝜃, 𝛽)𝑔(𝜃)𝑑𝜃
∞
−∞
(2.7)
where g(θ) is the distribution of the θ’s. The distribution g(θ) is assumed to be normally
distributed. By marginalization, x would only depend on the item parameters. The EM
algorithm is then used to find the maximum of the likelihood function. The EM algorithm
is iterative, where it estimates the expected number of examinees who answer response k
to item i (E-step), and finding the item parameters that maximize the likelihood function
of observing those numbers in the E-step (M-step; Yen & Fitzpatrick, 2006). The process
continues until some convergence criterion is satisfied. Estimation may be complicated
when the likelihood function is complex or when different item parameters lead to similar
model-implied probabilities. In these cases, Bayesian methods could be used for item
calibration (Levy & Mislevy, 2016), but they are not discussed in this study.
IRT Scoring. After items are calibrated, the item parameters are treated as fixed
and response patterns can be scored to estimate θ (van der Linden & Pashley, 2010).
Three IRT scoring methods are maximum likelihood (ML) scoring, maximum a
posteriori (MAP) scoring, or expected a posteriori (EAP) scoring. The ML score estimate
is the mode of the likelihood function with respect to θ (the product of the trace lines; Eq.
2.6) and reflects the range of θ where a participant’s response pattern is most likely
(Thissen & Orlando, 2001). The mode of the likelihood is found by iterative methods,
such as the Newton-Raphson procedure. Items with higher discrimination parameters
have a greater impact on the likelihood, thus a greater impact on the estimated θ score.
Score precision can be quantified by the ML standard error, defined by the second partial
derivative of the loglikelihood (inverse of the information function from Eq. 2.3 or 2.4)
13
and describes the spread of the likelihood function. A limitation of the ML scoring is that
the maximum for the likelihood function is not defined for all-correct or all-incorrect
response patterns – the participant has a higher or lower θ than what the items measure.
Therefore, other scoring methods need to be considered.
Both expected a posteriori (EAP) and maximum a posteriori (MAP) scoring
overcome the ML scoring limitations by including prior information (a prior distribution)
of θ. The product of the likelihood of each of the participant’s responses and the prior
distribution define a joint likelihood (or posterior distribution) that provides finite θ
estimates. The MAP[θ] estimate is the mode of that joint likelihood, also estimated using
iterative methods.
On the other hand, EAP scoring uses the mean of the joint likelihood as a score
estimate and is defined by the ratio of two integrals,
𝐸𝐴𝑃[𝜃] =∫ ∏ 𝑝𝑖(𝑢𝑖|𝜃)ϕ(𝜃)𝜃 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠
𝑖=1∞
−∞
∫ ∏ 𝑝𝑖(𝑢𝑖 |𝜃)ϕ(𝜃) 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠𝑖=1
∞−∞
. (2.8)
The numerator is the weighted sum of the θ by the posterior and the denominator is the
sum of those ratios (Thissen & Orlando, 2001). Uncertainty in the EAP[θ] estimate is
quantified by the standard deviation of the posterior,
𝑆𝐷[𝜃] = (∫ ∏ 𝑝𝑖(𝑢𝑖|𝜃)ϕ(𝜃)(𝜃−𝐸𝐴𝑃[𝜃])2 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠
𝑖=1∞
−∞
∫ ∏ 𝑝𝑖(𝑢𝑖|𝜃)ϕ(𝜃) 𝑑𝜃𝑛𝑖𝑡𝑒𝑚𝑠𝑖=1
∞−∞
)1/2
. (2.9)
The integrals for EAP[θ] and SD[θ] can be approximated by q quadrature points
along the range of θ. EAP [θ] is the most common method to derive θ scores (Thissen &
Steinberg, 2009), so they are used in this study. EAP[θ] estimates are considered
shrunken estimates because they pull θ towards the mean of the prior and have a smaller
variance than the likelihood. Previous research suggests that EAP[θ] scores could prevent
14
poor score estimation when the likelihood is bimodal and are more robust to IRT model
misspecifications than other scoring methods (Thissen & Wainer, 2001).
Overall, IRT provides a framework to calibrate item parameters and score item
responses to estimate θ. IRT scores have the property that they can be compared across
participants that have not taken the same items. This property is useful for study 2,
discussed below.
1.2.2 Receiver Operating Characteristic Curves
Receiver operating characteristic (ROC) curve analysis provides a valuable tool to
evaluate classification accuracy of a diagnostic assessment or a statistical model (Egan,
1975; Pepe, 2003; Zweig & Campbell, 1993). ROC analysis, rooted in signal detection
theory, was developed during World War II to investigate electronic signal detection in
radars, but the methodology was quickly adapted to study classification accuracy in the
areas of medicine, psychology, psychophysics, epidemiology, and radiology (McFall &
Treat, 1999; Zou, O’Malley, & Mauri, 2007). ROC curves describe the accuracy of a
diagnostic assessment by obtaining sensitivity and specificity information along cut
scores of the diagnostic assessment. An ROC curve is presented in Figure D, where
sensitivity is plotted on the y-axis and the false positive rate (1 – specificity) on the x-axis
for each cut score (defined by the unique values of the assessment). In other words, a 2x2
confusion table (as in Table A) is computed for each decision point, and sensitivity and
specificity are estimated and plotted. The empirical (non-parametric) ROC curve is then
estimated by connecting the sensitivity and specificity coordinates, describing the trade-
off between those two statistics. The empirical ROC curve goes from the lower left
corner, where the sensitivity and the false positive rate are 0 (maximally conservative
15
cutoff), to the upper right corner where the sensitivity and false positive rates are 1
(maximally liberal cutoff; McFall & Treat, 1999). The shape and height of the ROC
curve indicate how well the diagnostic assessment discriminates. A 45-degree line (the
chance diagonal) can be used as a reference because it represents an assessment that
makes diagnoses at random (the true positive rate and the false positive rate are the
same). The estimation of the empirical ROC curve does not involve distributional
assumptions, so the curve is jagged and in staircase shape. Smooth ROC curves can be
estimated by making distributional assumptions, such as a binormal model where both
distributions of scores are assumed to be normal. The relationship of the two curves
under the binormal model reduce to two numbers, the standardized difference in the
means between the distributions and the ratio of their standard deviations (Zhou et al.,
2011). This study uses empirical ROC curves, so distributional assumptions are not
discussed (see McFall & Treat, 1999). Overall, ROC curves allow researchers to easily
operate along the curve in how they want to make decisions based on the assessment.
Statistics from the ROC curve. A one-number summary statistic for diagnostic
accuracy is the area under the ROC curve (AUC). The AUC indicates the likelihood of
making a correct classification when two cases (one per group) are chosen at random, or
Figure D. ROC curve 100 - Specificity (%)
Se
nsitiv
ity (
%)
0 20 40 60 80 100
02
04
06
080
10
0
16
also interpreted as the average sensitivity (specificity) across all values of specificity
(sensitivity). An AUC of .50 (which is the AUC of the 45-degree line) indicates that
assessment decisions are made at random, and AUC’s between .5 and 1.0 indicate that
the assessment performs better than chance. It is difficult to provide minimal effect sizes
for AUC values because two assessments may have the same AUC, but are able to
discriminate in different clinically-relevant regions. Measures of partial AUC, estimates
of sensitivity or specificity at fixed values of the other, can be estimated for
predetermined areas of interest, but these are not used in this study.
There are three different ROC curve indices that can be used to determine a cut
score (Liu, 2012). The closest-to-(0,1) criterion chooses the minimum Euclidian distance
between the ROC cut scores and the ideal (0,1) point of the graph (perfect sensitivity and
perfect specificity). The closest-to-(0,1) criterion is,
𝐶𝑙𝑜𝑠𝑒𝑠𝑡 − 𝑡𝑜 − (0,1): 𝑀𝑖𝑛 {√[1 − 𝑆𝑝]2 + [1 − 𝑆𝑒]2} . (2.10)
Furthermore, a cut score can be determined by maximizing the Youden index, which is
the sum of specificity and sensitivity minus one,
𝑌𝑜𝑢𝑑𝑒𝑛 𝑖𝑛𝑑𝑒𝑥: 𝑀𝑎𝑥 {𝑆𝑒 + 𝑆𝑝 − 1}. (2.11)
Graphically, the Youden index is the point in the ROC curve that is farthest away from
the chance diagonal. Finally, a cut score can be determined by the concordance
probability, which is the product of sensitivity and specificity,
𝐶𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑐𝑒 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦: 𝑀𝑎𝑥{𝑆𝑒 ∗ 𝑆𝑝}. (2.12)
Graphically, the concordance probability is the area of a rectangle under the ROC curve
given a specific cut score. The width of the rectangle is the specificity and the height of
17
the rectangle is the sensitivity. These three ROC curve statistics are estimated in this
study.
The ROC curve has several important properties and makes several assumptions.
The ROC curve is scale invariant to monotonic transformations, so the ROC curve only
depends on the ranks of the assessment results. Furthermore, ROC curves can be
estimated for each new assessment and can be visually compared, instead of just
comparing sensitivity and specificity of just one threshold per assessment. ROC curves
are also invariant to the prevalence of the condition and changes in the cut scores (that is,
all cut scores are represented). One assumption is that the scores of the diagnostic
assessment can be ordered in magnitude so that sensitivity and specificity can be
determined by shifting the threshold. Furthermore, it is assumed that the ROC curve is
the same for all subsamples of the dataset, and that the researchers have access to the true
diagnosis. Finally, researchers might want to weigh true positives, false positives, true
negatives, and false negatives differently depending on costs and risks of the clinical
decision. There are utility functions to differentially weight these outcomes, but this study
assumes that true positives and true negatives should be equally weighted.
1.2.3 Summary
The psychometric approach to diagnostic classification is to fit an IRT model to
the assessment, estimate IRT θ scores, and use ROC analysis to determine the θ cut score
that maximizes sensitivity and specificity. As in any IRT model, the score precision
depends on meeting model assumptions. This section discussed IRT models, parameter
estimation, and score estimation. Also, this section discussed background on ROC curve
18
analysis with accuracy statistics to decide cut scores. Next, machine learning approaches
to classification are discussed.
1.3. Machine Learning Approaches to Diagnostic Assessment
Researchers aim to identify important predictors of medical or psychological
phenomena, such as suicidal behavior, depression, or vocabulary development.
Parametric models, such as linear or logistic regression, have been developed to identify
important predictors, but they suffer from several limitations (Strobl, Malley, & Tutz,
2009). Researchers might not be able to use parametric models when the number of
predictors exceed the number of cases (without using dimension reduction techniques) or
interpret complex models with many predictors or many higher-order interactions when
predicting an outcome. Using a parametric model also presumes that one knows the
correct functional form of the relation modeled (e.g., a linear relation between predictors
and outcome). Finally, current approaches to inferential statistics may find an optimal
solution to the dataset, but the solution may not generalize to other datasets (an
overfitting problem). Machine learning provides a series of exploratory statistical models
that focus on prediction, instead of inference. The aim of machine learning models is to
predict the patterns found in a dataset in an independent sample. Two machine learning
models that overcome the previous limitations are Lasso logistic regression and CART,
along with their extensions.
1.3.1 The Lasso for Logistic Regression
The Lasso for logistic regression is a variable selection method to choose the most
important predictors of a binary response. The section below introduces the Lasso for
19
OLS regression, discusses logistic regression, combines both Lasso and logistic
regression, and discusses an extension of Lasso logistic regression, the relaxed Lasso.
Lasso. The Lasso is a machine learning extension of ordinary least squares (OLS)
regression used to improve prediction and interpretability of the regression model
(Tibshirani, 1996). Coefficient estimates from OLS regression are unbiased when the
number of cases exceed the number of predictors, but prediction based on the model
tends to vary when the model is fit to another dataset. In other words, the OLS
coefficients may change drastically if the dataset is changed slightly. Also, OLS
regression might lead to a complex solution when there are many relevant and irrelevant
predictors in the model. The Lasso involves fitting a model with all of the predictors, but
adds a penalty to the OLS loss function so that the regression coefficients are shrunken
towards zero or to exactly zero. Shrinking (or regularizing) the regression coefficients
would induce some bias in the model solution (coefficients may not be optimal for that
specific dataset), but prediction would be less variable when those coefficients are used in
other datasets. Also, the Lasso could be used for variable selection because regression
coefficients might be estimated to be exactly zero, therefore some of the predictors would
drop from the final model. Therefore, the Lasso might provide some benefits to select the
best items from an assessment and give them appropriate weights to predict a diagnosis.
The loss function from the Lasso is similar to the OLS loss function, except that
the Lasso loss function adds a penalty to the size of the coefficients, such as,
𝐿𝑎𝑠𝑠𝑜 = ∑ (𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑗𝑥𝑖𝑗
𝑝
𝑗=1
)
2𝑛
𝑖=1
+ 𝜆 ∑|𝛽𝑗|
𝑝
𝑗=1
= 𝑅𝑆𝑆 + 𝜆 ∑ |𝛽𝑗|
𝑝
𝑗=1
. (3.1)
20
The first part of the loss function is the residual sums of squares from the OLS loss
function, and the second part is the shrinkage penalty that controls for the absolute size of
the Lasso coefficients. The impact of the penalty term is controlled by the nonnegative
tuning parameter λ that is determined by k-fold cross-validation so that λ yields the
smallest prediction error (as in the mean squared error) in the left-out fold. If λ is zero,
then the penalty parameter would not have any impact and one would get the OLS
coefficients, but the impact of the penalty increases as λ approaches ∞ and the
coefficients are shrunken or estimated to be zero. As a result, the Lasso coefficients may
be more interpretable for models with many predictors. It is important to note that it is
best to standardize the predictors before using the Lasso because coefficient size is
dependent on the scale of the predictors.
Logistic regression. In diagnostic classification, consider predicting the
probability of a binary diagnosis as a function of predictors,
𝑝(𝑌 = 1| 𝑿) = 𝑏𝑜 + 𝑏1𝑋1 + 𝑏2𝑋2. (3.2)
Probability is bounded by 0 and 1, so it might not be appropriate to use OLS regression
because a linear function might make predictions outside of that bound. One can turn to
an S-shaped, logistic function bounded by 0 and 1. The logistic function is,
𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑝(𝑌 = 1| 𝑿) =𝑒𝑏𝑜+𝑏1𝑋1+𝑏2𝑋2
1 + 𝑒𝑏𝑜+𝑏1𝑋1+𝑏2𝑋2 . (3.3)
After manipulation, there is a linear relationship between the natural logarithm of the
odds (𝑝(𝑌 = 1| 𝑋)
1−𝑝(𝑌 = 1|𝑋)) and the predictors,
ln (𝑝(𝑌 = 1| 𝑋)
1 − 𝑝(𝑌 = 1|𝑋)) = 𝑏𝑜 + 𝑏1𝑋1 + 𝑏2𝑋2. (3.4)
21
The regression coefficients indicate that for a one-unit change in X, there is a one-unit
change in the logit (log of the odds unit) of the outcome. Furthermore, the regression
coefficients are estimated using maximum likelihood. Below is the loglikelihood
equation of logistic regression,
𝑙(𝛽) = ∑{𝑦𝑖𝛽𝑇𝑥𝑖 − log(1 − 𝑒𝛽𝑇𝑥𝑖)}
𝑁
𝑖=1
. (3.5)
The coefficients βT can then be used to estimate predicted probabilities (from Eq. 3.4),
and in turn predicted categories of the outcome.
Lasso logistic regression. Just as in any linear regression, the Lasso penalty can
be extended to logistic regression. Below is the penalized version of the loglikelihood for
logistic regression,
𝑙(𝛽)𝐿1 = ∑{𝑦𝑖𝛽𝑇𝑥𝑖 − log(1 − 𝑒𝛽𝑇𝑥𝑖)}
𝑁
𝑖=1
− 𝜆 ∑|𝛽𝑗|
𝑝
𝑗=1
. (3.6)
By maximizing the previous loglikelihood function, researchers can use the Lasso for
variable selection when they are predicting a diagnosis. Similar to Lasso in OLS
regression, the tuning parameter λ is chosen by k-fold cross-validation. With Lasso
logistic regression, the probability of diagnosis could be predicted by items in the
diagnostic assessment, which in turn would derive a model with the most important
predictors and their weights to predict the diagnosis.
Relaxed Lasso logistic regression. The estimates of the non-zero Lasso
coefficients are biased towards zero, and as sample size increases, they do not approach
their true values. To overcome this bias, an unrestricted logistic regression model could
be fitted again only with the variables selected by the Lasso. This approach is known as
22
the relaxed Lasso (Meinshausen, 2007), and this Lasso extension is also carried out in
this study.
1.3.2 Classification and Regression Trees
Classification and regression trees (CART) is a machine learning, nonparametric
regression method that can identify important predictors of an outcome by exploring and
partitioning the complete predictor space into regions where values of the outcome are
most similar to each other (Breiman, Friedman, Olshen, & Stone, 1984). Trees are useful
and easy to interpret, they do not depend on a functional form, and they can probe higher-
order predictor interactions. Recently, the CART algorithm has been proposed as a
method to adaptively select items from a longer scale to predict a diagnosis (Gibbons et
al., 2016; McArdle, 2013). In this section, the CART algorithm is described and its
application to predict a diagnosis is discussed.
The CART algorithm. CART uses binary recursive partitioning for prediction.
In general, CART selects the best binary split among all possible predictor values to
partition the dataset into two regions where the outcome is most similar. Then, the
procedure is carried out recursively in each derived region until the CART algorithm
meets a stopping rule. Splits in the trees are interpreted as interactive effects of predictors
on the outcome. Machine learning is rooted in cross-validation, so the algorithm typically
builds the model in a training dataset and then prediction performance is evaluated in a
validation or holdout dataset. The CART algorithm can be used to predict quantitative
outcomes (regression trees) or qualitative outcomes (classification trees). This discussion
will focus on classification because this study deals with predicting a diagnosis.
23
Classification trees. First, consider the predictor space 𝑋, which is a matrix of p
predictors in the training dataset. Take all possible binary splits across all the s values of
predictor 𝑋𝑗 (there are s-1 splits) and create two regions in the dataset, one for cases
where 𝑋𝑗 < 𝑠 and another for cases where 𝑋𝑗 ≥ 𝑠. In other words, the dataset is split in
two parts, one that meets the first conditional statement based on 𝑋𝑗 and 𝑠 and one that
does not meet the conditional statement. This procedure is done to all p predictors in the
𝑋 predictor space, and the best 𝑠 value on 𝑋𝑗 predictor among all p predictors is chosen to
split the predictor space 𝑋. In this case, the best split is defined as the one that minimizes
the misclassification rate E in each region. The misclassification rate is the proportion of
cases where the observed and predicted value differ,
𝐸 = 1 − max(�̂�𝑚𝑘), (3.7)
where �̂�𝑚𝑘 is the proportion of the observations in the training dataset in the mth region
that belong to the kth category. Once the best split has been decided, the procedure is
carried out again in the defined regions until the algorithm meets a stopping rule, often
based on a maximum number of splits or minimum number of cases in each region. The
predicted value is the mode in the region.
Splitting rules. Splits depend on the predictor type. For a continuous predictor,
candidate splits are considered along all unique values of the predictor. Suppose an item
score is a predictor of the diagnosis, so a proposed partition could be defined by
assessment takers who scored less than 3 on the item and assessment takers who scored 3
or higher. On the other hand, splits on unordered categorical predictors are defined in the
one (or some)-versus-rest form for all combination of predictors. Consider a specific
clinician as a predictor of diagnosis. A proposed partition can be defined by patients who
24
have Johnny as their doctor and those who do not have Johnny as their doctor (such as
those who have Sally or Jane). There are 2k-1-1 candidate splits when a predictor has k
categories. Finally, once a predictor is used, predictors can be used again to define the
best split in further regions.
Model evaluation. After growing the classification tree in a training dataset, the
tree performance can be evaluated using a holdout dataset. The splits determined by the
CART algorithm during the training phase can be used to predict the outcome in the
holdout dataset. Misclassification rate and variance explained can be estimated from the
predicted values and the observed values in the holdout dataset. It is expected that the
misclassification rate and variance explained in the holdout dataset to be different than
those computed in the training dataset, so the difference in both values can conceptualize
the robustness of the model on different datasets.
CART criticisms and pruning. CART is considered a greedy algorithm because
once it decides on a predictor to split on, that initial decision is not reconsidered in further
regions even though a different predictor could have been a better predictor further down
the model. CART is single-minded because it only looks for regions where the outcome
is most homogeneous. As a result, CART could overfit the training dataset and provide
an unstable solution that may not generalize to other datasets. Therefore, the final CART
model might not be useful in making inferences in other samples. Cost-complexity
pruning is a CART extension to overcome these limitations. Cost-complexity pruning
consists in overgrowing a tree in a training dataset (growing a deep tree that learns all of
the idiosyncrasies of the dataset) and then pruning back branches of the tree with the goal
of reducing prediction error in the holdout dataset. Cost-complexity pruning introduces a
25
penalty parameter that controls the trade-off between tree size and outcome prediction
(James et al., 2013). The size of the tree is determined by k-fold cross-validation, where
the number of regions remaining in the final tree is the size that minimizes the prediction
error in the left-out fold during cross-validation. It is hypothesized that the final model
would be more generalizable to other datasets.
1.3.3 Random Forest: A CART Extension
The random forest algorithm is a CART extension that can improve the tree
stability and can help on prediction and generalizability of the result (Breiman, 2001).
Random forest is an ensemble method that grows decision trees using the CART
algorithm (outlined above) in bootstrapped datasets and averages over all of the models
to make a prediction. In this case, the series of trees, or forest, is used for simultaneous
prediction rather than just a single tree. However, when there is a strong predictor of the
outcome, the tree in each bootstrap dataset might look similar. To increase tree diversity
in bootstrapped datasets, a random sample of candidate predictors to split on are chosen
every time the tree makes a split. Random forest provides more stability to the CART
solution because it could overcome multicollinearity problems when two predictors are
strongly correlated, so in random forests both predictors have opportunities to appear in
the trees. Also, more diverse trees are associated with lower prediction error in testing
datasets. An important limitation of random forests is that the interpretability of the single
tree is lost in exchange for prediction. Although variable importance measures have been
used to identify important predictors in random forests, previous work suggests that there
are methods to approximate ensemble results with a single tree (Gibbons et al., 2013).
1.3.4 Summary
26
Overall, the machine learning methods use data-driven approaches to predict a
diagnosis using assessment items. The Lasso logistic regression selects the most
important items to predict the diagnosis, weights the items by the Lasso coefficients, and
yields a predicted probability of diagnosis. On the other hand, CART can grow a tree
where items are predictors of the classification (Gibbons, Weiss, Frank, & Kupfer, 2016;
McArdle, 2013; Yan, Lewis, & Stocking & 2004), so items could be administered
adaptively depending on the participant’s response. Finally, random forest uses a similar
approach to CART to build the models, but the solution is more stable and variable
importance measures are used to understand which are the most important predictors of
the diagnosis. Previous research on the performance of machine learning algorithms to
predict diagnoses (Lu and Petkova, 2014) suggests that Lasso logistic regression might be
a better classifier than CART and random forest, even though Lasso dropped a lot of
items from the model (set many items to zero). The biggest difference between this
simulation study and Lu and Petkova’s work is that the data-generating model in this
simulation mimics a psychometric assessment with an underlying latent variable, while
their simulation used ten continuous predictors as the assessment.
1.4 Simulation Study
The goal of the present study is to compare the performance of psychometric and
machine learning algorithms for diagnostic assessment. Psychometric methods for
diagnostic assessment focus on improving the measurement precision of an assessment,
and then determine a diagnosis. Strengths of the psychometric methods are that they may
produce reliable and interpretable scores, along with providing a theoretical framework of
inference for how the diagnosis is related to the construct measured by the assessment.
27
Potential problems with the psychometric methods might be the sensitivity to violations
of assumptions, influence of sample size, and that prediction of the diagnosis is not direct
but a by-product of the measurement process (Gibbons et al., 2013). On the other hand,
the machine learning methods for diagnostic assessment build a model to predict the
probability of having the diagnosis. A strength of the machine learning assessments is
that items are selected into the model based on their relation to the diagnosis, so
prediction is done directly. Potential problems with the machine learning methods are
that they assume items are perfect predictors, when in fact they could suffer from
measurement error and bias. Also, the machine learning methods do not provide a
framework for inference in how or why the items are related to the diagnosis, so even if
the outcome changes to a closely-related construct, the model would not be optimal. In
machine learning terminology, the machine learning methods are considered supervised
learning methods because the predictors are trained to predict an outside outcome (the
diagnosis). For each predictor observation, there is an observed output to predict (James
et al., 2013). On the other hand, early stages of psychometric methods are based on
unsupervised learning (dimensionality assessment) because exploratory factor analysis
builds the model based on item similarities without referencing an outside criterion, so
interpretations of the clusters are theoretical.
In this study, it is hypothesized that classification accuracy would increase as
sample size, number of item categories, number of items, prevalence of the diagnosis,
and the diagnosis-test correlation increase. Furthermore, it is hypothesized that, on
average, machine learning methods would have higher classification accuracy than
psychometric methods in conditions where there are violations of psychometric
28
assumptions (such as local dependence) and small sample sizes because the theta scores
may not be estimated accurately. However, it is hypothesized that psychometric methods
would have higher classification accuracy than machine learning methods when the
prevalence is low. That is, psychometric methods balance sensitivity and specificity of
the diagnosis, while machine learning methods may just predict that the lowest occurring
class never happens. Within the machine learning methods, it is hypothesized that the
random forest algorithm may be the best performing classifier because of the diversity in
the ensemble used to make the classification decision. Finally, it is also hypothesized
that, as more items are chosen by the machine learning algorithm, the items could recover
the true IRT θ score. In this simulation, it is assumed that the predictors in the machine
learning algorithms have been theoretically vetted as indicators of a latent construct.
2. Method
This section describes the data-generation, simulation factors, models of analysis,
and general procedures of this study. Appendix A also shows a simulation flowchart used
to generate data. Data were generated, analyzed, and summarized in the R (base)
statistical environment, with code presented in Appendices D and E.
2.1 Data-Generation
Data-generating theta and true diagnosis. For each participant, data were
generated by drawing two positively correlated variables, θAssmt and θdiag, from a bivariate
normal distribution. In this case, the variable θdiag was the score on a construct, which was
dichotomized to yield the participant’s true diagnosis. Thresholds for θdiag were derived
from the quantiles of the normal distribution to yield a specific prevalence of diagnosed
participants. Participants with θdiag below the threshold did not have the diagnosis
29
(diagnosis = 0), and participants with θdiag above the threshold had the diagnosis
(diagnosis = 1). The variable θAssmt represents the participant’s data-generating IRT score
on the assessment, which in turn was used to simulate the item responses.
Item-level responses. IRT item parameters were sampled at each replication.
Binary items only have one threshold, so the b parameters were simulated from the
distribution N(0, 1), and the a parameters were simulated from the distribution N(1.7, .3).
Polytomous items have k-1 thresholds for items with k categories, so the first b parameter
was simulated from the distribution N(-.6, 1), and the remaining b parameters were
sequentially higher than the previous one by adding a random number from the
distribution U(.5, .9). To prevent collapsibility problems, items included in this study had
each of their categories endorsed at least five times. The a parameters for polytomous
items were also simulated from the distribution N(1.7, .3). With θAssmt and the item
parameters, the IRT models in either Eq. 2.1 or Eq. 2.2 were used to obtain the expected
probability of endorsing each category for each item. Finally, the item response was
obtained by comparing the expected probability of the model to a random proportion
from the distribution U(0,1). These item responses were then used to predict the
diagnosis.
Simulation factors. There were six factors varied in this simulation: training
sample size (N=250, 500, 1000), number of items (10, 30), number of item categories (2,
5), diagnosis-test correlation (.30, .50, .70), prevalence of the diagnosis (.05, .10, .20),
and violation of the local independence assumption of IRT (no violation or for 5 items in
30% of the sample). Specifically, this simulation examined surface local dependence,
where participants might respond to a set of items identically because of similar content
30
or location (Edwards, Houts, & Cai, in press). In other words, the IRT model does not
determine the responses to the item, but the item response is conditional on the response
to another item. In the condition of LD=.30, data were generated so that 30% of the
participants had identical responses to the first five items, regardless of the expected
response to the item given by the θAssmt value.
Overall, there were 216 conditions, with 500 replications per condition.
Conditions analyzed had at least 50% of the models across replications converge, or at
least 50% of the models across replications assigned any cases to the minority class. The
simulation outcomes for study 1 (classification rate, sensitivity, and specificity) and for
study 2 (person parameter recovery) were evaluated in a testing sample of N=5,000. It is
important to note that the total number of cases simulated per replication were the sum of
the testing and training sample sizes. Simulation outcomes were analyzed using linear
regression and exploratory methods, such as binary recursive partitioning and the random
forest algorithm (RFA), to identify which simulation factors were important predictors of
the simulation outcomes. For clarification, binary recursive partitioning and RFA for
simulation outcomes should not be confused with conditions using CART and random
forest to assign cases to diagnoses. Assumptions of linear regression were checked by
looking at Cook’s distance for influential points; q-q plots for normality; residual plots
and histograms for functional form; and plots of standardized residuals as a function
outcome value for the assumption of homogeneity of variance. Preliminary checks for
each analysis suggest that the assumptions of regression were largely met.
2.2 Data Analysis
31
Models of analysis. All of the models used the same training sample for each
replication. For the psychometric models, the training sample was used for item
calibration and to determine the cut score. For the machine learning models, the training
sample was used to develop the predictive models. Most machine learning models were
developed using 10-fold cross-validation, except in conditions with either a small sample
size (N=250) or low prevalence (5%) where both 5-fold cross-validation and stratified
sampling were used to guarantee that every fold had both diagnosed and nondiagnosed
cases. As previously mentioned, the machine learning models could predict a diagnosis
using either a Bayes classifier or a ROC classifier. The Bayes classifier assigns cases to
the most probably class. On the other hand, the ROC classifier estimates a ROC curve
from the predicted probabilities of diagnosis and then determines a probability threshold
to balance sensitivity and specificity with the indices discussed in section 1.2.2. Finally,
classification accuracy and parameter recovery for both psychometric and machine
learning models were evaluated with the testing sample of N=5,000. It was assumed that
true positives and true negatives for the diagnoses were equally important, so they were
equally weighted.
1. Data-generating theta: Use the data-generating theta (from the data-generation
step) in a ROC analysis (using the pROC R package; Robin et al., 2011) to
determine cut scores to predict the true diagnosis either to minimize the closest-
to-(0,1) criterion (Eq. 2.10), maximize the Youden index (Eq. 2.11), or maximize
the concordance probability (Eq. 2.12).
a. Predict the class based on data-generating theta in the testing sample from
the cut scores determined from the training sample.
32
2. Raw summed score: Estimate a total score by summing all of the items and use
in a ROC analysis (in the pROC R package) to determine cut scores either to
minimize the closest-to-(0,1) criterion (Eq. 2.10), maximize the Youden index
(Eq. 2.11), or maximize the concordance probability (Eq. 2.12).
a. Predict the class based on the raw summed score in the testing sample
from the cut scores determined from the training sample.
3. Estimated theta: Fit a unidimensional IRT model to calibrate the items, and then
estimate an IRT EAP[𝜃] score (using MML-EM in the mirt R package;
Chalmers, 2012). The estimated theta is then used in a ROC analysis (in the pROC
R package) to determine cut scores either to minimize the closest-to-(0,1)
criterion (Eq. 2.10), maximize the Youden index (Eq. 2.11), or maximize the
concordance probability (Eq. 2.12).
a. Estimate theta scores for participants in the testing sample using the item
parameters from the training sample, and then predict the class using the
estimated theta cut score determined from the training sample.
4. Logistic regression. Predict the probability of diagnosis in the training sample
from item responses using logistic regression (Eq. 3.5, using the glm R package;
R Core Team, 2017).
a. Bayes classifier: Predict the probability of diagnosis in the testing dataset
using the logistic regression model from the training sample and assign the
class with probability > .5.
b. ROC classifier: Estimate an ROC curve from the predicted probabilities in
the training sample and determine a probability threshold using the ROC
33
indices described above. Predict probability of diagnosis in the testing
dataset using the logistic regression model from the training sample and
assign the class depending on the probability threshold.
5. Lasso logistic regression: Predict the probability of diagnosis in a training
sample from item responses using a logistic regression, regularizing the
parameters with the L1 norm (Eq. 3.6, using the glmnet R package; Friedman,
Hastie, & Tibshirani, 2010). Use cross-validation to determine the penalty
parameter that minimizes prediction error in the left-out fold. To generalize to
other samples, a penalty parameter one-standard-error away from the one that
minimized the prediction error was chosen to regularize the parameters.
a. Bayes classifier: Predict the probability of diagnosis in the testing dataset
using the Lasso logistic regression model from the training sample and
assign the class with probability > .5.
b. ROC classifier: Estimate an ROC curve from the predicted probabilities in
the training sample and determine a probability threshold using the ROC
indices described above. Predict probability of diagnosis in the testing
dataset using the Lasso logistic regression model from the training sample
and assign the class depending on the probability threshold.
6. Relaxed Lasso logistic regression. Predict the probability of diagnosis using the
Lasso from Model 5 and save the predictors remaining in the model. Re-run the
analysis using an unstructured logistic regression model (Model 4) and estimate
the predicted probability of diagnosis.
34
a. Bayes classifier: Predict the probability of diagnosis in the testing dataset
using the relaxed Lasso logistic regression model from the training sample
and assign the class with probability > .5.
b. ROC classifier: Estimate an ROC curve from the predicted probabilities in
the training sample and determine a probability threshold using the ROC
indices described above. Predict probability of diagnosis in the testing
sample using the relaxed Lasso logistic regression model from the training
sample and assign the class depending on the probability threshold.
7. Classification and regression trees: Predict the probability of diagnosis in the
training sample from the item responses using a classification tree (tree R
package; Ripley, 2016). Overgrow the tree in a training sample (deviance of
.0001) and then prune back using cost-complexity pruning. Determine the size of
the tree by cross-validation to find the tree size that minimizes misclassification in
the left-out fold.
a. Bayes classifier: Predict the diagnosis in a testing sample using the pruned
tree from the training sample and assign the class with probability > .5.
b. CART with a ROC classifier was not carried out because CART yields the
same probability to each case in the node, limiting the number of possible
probability thresholds during the ROC analysis.
8. Random Forest: Predict the probability of diagnosis in the training sample from
the item responses using the average prediction of 500 classification trees grown
in 500 bootstrapped datasets from the training sample (using the
randomForest R package; Liam & Wiener, 2002). However, at each node a
35
random subset of the items (p items) were candidate predictors to split on.
Random sampling of items increases diversity of the trees, thus improving
prediction.
a. Bayes classifier: Predict the diagnosis in a testing sample using the
random forest model from the training sample and assign the class with
probability > .5.
b. ROC classifier: Estimate an ROC curve from the predicted probabilities in
the training sample and determine a probability threshold using the ROC
indices described above. Predict probability of diagnosis in the testing
sample using the random forest model from the training sample and assign
the class depending on the probability threshold.
Overall, there are eight different models (three psychometric models and five machine
learning models), and four out of the five machine learning models predicted the
diagnosis with two different classifiers. In total there were 12 models of analysis in this
study. The performance of each model is compared relative to the classification accuracy
of data-generating theta (Model 1).
Study 1: Classification accuracy. The goal of study 1 is to investigate the
classification accuracy across the psychometric and machine learning approaches. The
main outcomes of the simulation were classification rates, sensitivity, and specificity
(outlined in section 1.1.1) per model, and also differences in classification accuracy
across models. The outcomes were analyzed using the binary recursive partitioning
approach outlined in Gonzalez, O’Rourke, Wurpts, and Grimm (2017), and also linear
regression, using the six simulation factors as predictors.
36
Study 2: Recovery of the person parameter. One of the properties of Lasso
logistic regression and the CART algorithm is that they select the most important
predictors of the outcome. The goal of study 2 is to investigate if the items selected by
these two machine learning algorithms could recover the data-generating theta. In theory,
the person parameter should be recovered because of the linking property of IRT scores.
However, it is of interest to investigate if there are ceiling or floor effects when an IRT
score is estimated from the remaining set of items. In this case, the unique items chosen
by the machine learning algorithms were scored using the estimated item parameters
from Model 3, and the scores were compared to the data-generating theta using mean
squared error (MSE) and the correlation between estimated and data-generated theta.
Both MSE and the correlation between estimated and data-generating theta were
analyzed using the binary recursive partitioning approach outlined in Gonzalez et al.,
(2017), and using linear regression with the six simulation factors as predictors.
3. Results
There were a lot of results in this study, so this section integrates results across
models and classification indices, along with important findings. A comprehensive write-
up, details, and tables on each of these analyses are presented in Appendix F. The
summary of classification indices in Table 1, along with partial-η2 and variable
importance measures, were used to determine what results to present and which effects to
probe. This section starts by providing results on the estimation and classification
accuracy of the psychometric models. After that, the estimation of the machine learning
models with Bayes and ROC classifiers are described. Then, the classification accuracy
within and across machine learning models with ROC classifier is discussed. Next, the
37
machine learning models with ROC classifier are compared to the classification accuracy
of data-generating theta. Finally, recovery of the person parameter by CART and Lasso
algorithms is discussed.
3.1. Estimation of the Psychometric Models
IRT parameter estimation. Convergence and parameter recovery information
suggest that there were no apparent estimation problems. There were 129 models (out of
108,000 models; 216 conditions, 500 replications per condition) that did not converge,
mainly from conditions with either binary items, ten items, or a sample size of 250.
Parameter recovery. For theta (person parameter) recovery, MSE decreased as
the number of categories increased. For conditions with five-category items, MSE
decreased as sample size increased. The correlation between the estimated theta and the
data-generating theta increased as the number of item categories and the number of items
increased. For item parameters, the MSE decreased and variance explained increased as
sample size and number of items increased and local dependence decreased.
3.2. Classification Accuracy of the Psychometric Models
Effect of ROC index. For the psychometric models, the cut scores in the ROC
analysis were determined by the Youden index, closest-to-(0,1) criterion, and the
concordance probability. However, the performance of these three indices was not
practically different from each other as a function of simulation factors (R2=.002 - .009).
Therefore, the results for the psychometric models are presented based on the Youden
index.
Comparing classification accuracy across psychometric models. Results
suggest that there were no practical differences in classification accuracy between using
38
data-generating theta and the estimated theta (classification rate R2 = .003; sensitivity R2
= .001; and specificity R2 = .002); between using data-generating theta and the raw
summed scores (classification rate R2 = .004; sensitivity R2 = .001; and specificity R2 =
.003); and between using the estimated theta and the raw summed scores (classification
rate R2 = .001; sensitivity R2 = .001; and specificity R2 = .001). It is hypothesized that
these models perform similarly because of the high reliability of the raw summed score
and the recovery of the IRT person parameter. Therefore, results are presented only for
data-generating theta and estimated theta.
Classification accuracy with data-generating theta. In this study, classification
rates ranged from .60 to .80; sensitivity ranged from .59 to .83; and specificity ranged
between .59 to .80. Classification accuracy increased as the diagnosis-test correlation
increased.
Linear Regression. In a regression predicting classification rate of the data-
generating theta from simulation factors, the variance explained was R2=.387.
Classification rates increased as the diagnosis-test correlation increased (b=.177, s.e.
=.005, t=34.429, p<.001, partial-η2=.375) and as the prevalence decreased (b=-.024,
s.e.=.005, p=-4.670, partial-η2=.029). In a regression predicting sensitivity from the
simulation factors, the variance explained was R2=.317. Sensitivity increased as the
diagnosis-test correlation increased (b=.185, s.e. =.007, t=26.710, p<.001, partial-
η2=.312). In a regression predicting specificity from the simulation factors, the variance
explained was R2=. 277. Specificity increased as the diagnosis-test correlation increased
(b=.176, s.e. =.007, t=27.194, p<.001, partial-η2=.264) and as the prevalence decreased
(b=-.025, s.e.=.006, t=-3.860, p<.001, partial-η2=.022).
39
Binary recursive partitioning. For the prediction of classification accuracy of the
data-generating theta in the training sample, binary recursive partitioning only made
splits based on the diagnosis-test correlation. In an RFA model grown in a training
sample to predict classification accuracy of the data-generating theta, the most important
variables were the diagnosis-test correlation and prevalence. Using the RFA model to
predict classification rate in the testing sample (a holdout sample), the MSE was .007,
and the variance explained was .385. Also, the MSE in the prediction of sensitivity was
.013, and the variance explained was .315. Finally, MSE in the prediction of specificity
was .011 and the variance explained was .274.
Classification accuracy with estimated theta. In this study, classification rates
ranged from .58 to .80; sensitivity ranged from .58 to .82; and specificity ranged for .59
to .79. Classification accuracy seemed to increase as the diagnosis-test correlation
increased.
Linear Regression. In a regression predicting classification rate of the estimated
theta from the simulation factors, the variance explained by the predictors was R2=.337.
Classification rates increased as the diagnosis-test correlation increased (b=.146, s.e.
=.005, t=26.964, p<.001, partial-η2=.321) and as the prevalence decreased (b=-.032,
s.e.=.005, t=-5.887, p<.001, partial-η2=.021). In a regression predicting sensitivity from
the simulation factors, the variance explained by the predictors was R2=.280. Sensitivity
increased as the diagnosis-test correlation increased (b=.175, s.e. =.007, t=24.010,
p<.001, partial-η2=.272). In a regression predicting specificity from the simulation
factors, the variance explained by the predictors was R2=.236. Specificity increased as the
diagnosis-test correlation increased (b=.144, s.e. =.007, t=21.177, p<.001, partial-
40
η2=.220) and as the prevalence decreased (b=-.039, s.e.=.007, t=-5.793, p<.001, partial-
η2=.017).
Binary recursive partitioning. For the prediction of classification accuracy of the
estimated theta in the training sample, binary recursive partitioning only made splits
based on the diagnosis-test correlation. In an RFA model grown in a training sample to
predict the classification accuracy of the estimated theta, the most important variables
were diagnosis-test correlation and prevalence. Using the RFA model to predict
classification rate in the testing sample (a holdout sample), the MSE was .007, and the
variance explained was .334. Also, MSE in the prediction of sensitivity was .014, and the
variance explained was .281. Finally, the MSE in the prediction of specificity was .012,
and the variance explained was .233.
Brief Summary. Across psychometric models, the best predictors of
classification accuracy were the diagnosis-test correlation and prevalence. Specifically,
classification accuracy increased as the diagnosis-test correlation increased and
prevalence decreased. More details can be found in Appendix F, section 1 and 2.
3.3. Estimation of the Machine Learning Models
Model building. All of the machine learning models with a Bayes classifier only
assigned cases to the majority class, except in conditions with high diagnosis-test
correlation, high prevalence, high sample size, and 30 five-category items. As shown in
Table 1, these models also had very low sensitivity and inflated specificity compared to
the sensitivity and specificity of data-generating theta. Therefore, machine learning
models with a Bayes classifier are only discussed in appendix F, section 3 and 4.
41
On the other hand, both random forest and logistic regression using ROC
classifiers had a high likelihood of assigning cases to the minority class across all
conditions. Lasso logistic regression and relaxed Lasso logistic regression only had
greater than a 50% chance of assigning cases to the minority class in conditions with a
diagnosis-test correlation of .70. Therefore, only these four models are discussed in the
next part of this section.
3.4. Classification Accuracy of Machine Learning Models with ROC Classifiers.
For the machine learning methods examined, Figure 1 shows the variance
explained and the partial eta-squared effect sizes for the predictors of classification
accuracy across methods, and Figure 2 shows the RAF variable importance measures for
each of the predictors. Further details are provided in appendix F, section 5.
Effect of ROC index. For random forest, there were medium differences in
classification rate (R2=.149-.150), sensitivity (R2 = .122-.124), and specificity (R2 =.134-
.135) as a function of simulation factors, across the three ROC indices. For logistic
regression, there were small differences in classification rate (R2=.017-.022) and
specificity (R2 =.014-.016), and small to medium differences in sensitivity (R2 = .016-
.137) as a function of simulation factors, across the three ROC indices. For Lasso logistic
regression, there were small differences in classification rate (R2=.008-.017), sensitivity
(R2 = .003-.006), and specificity (R2 =.004-.005) as a function of simulation factors,
across the three ROC indices. Finally, for the relaxed Lasso logistic regression, there
were small differences in classification rate (R2=.006-.014), sensitivity (R2 = .003-.005),
and specificity (R2 =.003-.005) as a function of simulation factors, across the three ROC
indices. In these analyses, the Youden index had the highest sensitivity across the vast
42
majority of conditions, so results presented are based on the Youden index to increase
sensitivity.
Logistic regression classification accuracy with ROC classifier. According to
partial-η2 effect sizes (Figure 1) and RAF variable importance measures (Figure 2), by far
the most important predictor of classification accuracy was diagnosis-test correlation. On
average, classification rate, sensitivity, and specificity increased as the diagnosis-test
correlation increased. Also, classification rates increased as sample size increased. For
sensitivity, there were significant three-way interactions between sample size, prevalence,
and diagnosis-test correlation, and between sample size, prevalence, and number of
items. However, the importance of other predictors beyond diagnosis-test correlation was
not reflected in the variable importance measures. Finally, specificity decreased as
prevalence increased, although the importance of prevalence is not reflected in the
variable importance measures. According to Cohen’s f, variance explained by the
simulation factors was greater than a large effect size.
Lasso and relaxed Lasso classification accuracy with ROC classifier. There
were no practical differences in classification accuracy between Lasso logistic regression
and relaxed Lasso logistic regression as a function of simulation factors (classification
rate R2=.004; sensitivity R2=.004; specificity R2=.004). Therefore, only estimates from
the relaxed Lasso are discussed in the rest of the document. Predictors of classification
accuracy were the main effects of item categories, sample size, number of items, and
prevalence. According to Cohen’s f, variance explained by the simulation factors was
between a small and a medium effect size. On average, classification accuracy increased
as sample size, number of items, and number of item categories increased, and
43
classification accuracy decreased as prevalence increased. As mentioned before, these
results apply only to conditions where the diagnosis-test correlation is .70.
Random Forest classification accuracy with ROC classifier. According to
partial-η2 effect sizes and RAF variable importance measures, the most important
predictor of classification accuracy of the random forest model is diagnosis-test
correlation. Additionally, important predictors of sensitivity were number of items,
prevalence, and number of item categories. On average, classification rates and
specificity increased as the diagnosis-test correlation increased. On average, sensitivity
increased as the diagnosis-test correlation, number of item categories, prevalence, and
number of items increased, and sensitivity decreased as sample size increased. However,
as the number of items and item categories increased, differences in sensitivity across
sample size and prevalence decreased.
Comparing accuracy across machine learning models with ROC classifier.
There were differences between relaxed Lasso logistic regression and logistic regression
on classification rates (R2=.120), sensitivity (R2=.393), and specificity (R2=.111).
Differences in classification accuracy of relaxed Lasso logistic regression and logistic
regression varied as a function of sample size, number of items, and prevalence.
Conditions with 30 items favored relaxed Lasso logistic regression, and conditions with
ten items favored logistic regression. Across number of items, differences in
classification accuracy decreased as prevalence and sample size increased.
Also, there were differences between relaxed Lasso logistic regression and
random forest on classification rates (R2=.120), sensitivity (R2=.393), and specificity
(R2=.111). According to variable importance measures, the differences in classification
44
accuracy across relaxed Lasso logistic regression and random forest varied as a function
of number of item categories, prevalence and number of items. Across prevalence,
conditions with 30 five-category items had classification rates and specificity that favored
random forest, along with conditions with ten binary items with 5% prevalence.
Conditions with 30 binary items and most conditions with ten binary items had
classification rates and specificity that favored relaxed Lasso logistic regression.
Differences in classification rates decreased as prevalence and number of items increased,
and as the number of item categories decreased. For sensitivity, most conditions favored
relaxed Lasso logistic regression, especially conditions with ten binary items and low
prevalence.
Finally, there were differences in classification rate (R2=.184), sensitivity
(R2=.351), and specificity (R2=.170) between logistic regression and random forest. For
classification rates, most conditions with ten five-category items favored logistic
regression, while conditions with 30 five-category items favored random forest. For five-
category items, differences between methods increased as sample size increased, and
decreased as prevalence increased. Conditions with ten binary items and low prevalence
favored random forest, while conditions with 30 binary items and sample size greater
than 500 favored logistic regression. For sensitivity, conditions with 30 items or
conditions with ten five-category items favored random forest. Conditions with ten binary
items favored logistic regression, where conditions with high prevalence and high sample
size had larger sensitivity for logistic regression. For specificity, most conditions favored
logistic regression, except for conditions with ten binary items and low prevalence.
3.5. Comparing Psychometric and Machine Learning Models
45
Using a one-number summary for classification accuracy, the four machine
learning models were able to recover true prevalence from estimates of apparent
prevalence, sensitivity, and specificity. Therefore, we compared each classification
accuracy index separately. More information can be found in appendix F, section 6.
Machine learning models vs data-generating theta. The vast majority of
conditions had higher classification accuracy for the data-generating model than for the
logistic regression model. Differences in classification between logistic regression and
data-generating theta ranged between -.09 to .01; differences in sensitivity ranged from -
.30 to .02; and differences in specificity ranged from -.11 to .03. Although there were
significant differences in classification accuracy across simulation factors (classification
rate, R2=.067; sensitivity, R2=.037; specificity, R2=.028), none of the predictors had a
partial-η2 > .01.
Also, the vast majority of conditions had a higher classification rate for the data-
generating model than for the relaxed Lasso logistic regression model. Differences in
classification rate between relaxed Lasso and data-generating theta ranged from -.10 to
.01; sensitivity differences ranged from -.04 to .01; and specificity differences ranged
from -.10 to -.02. Differences in classification rate (R2=.082) and specificity (R2=.028)
decreased as sample size and prevalence increased. Classification rate differences also
decreased as number of items increased. None of the predictors of the difference in
sensitivity (R2=.055) had a partial η2 > .01.
Finally, the vast majority of conditions had higher classification rates for the data-
generating model than for the random forest model, except in conditions with ten binary
items, prevalence of .05, and sample size of 500 or 1,000. Differences in classification
46
rate between the random forest model and data-generating theta ranged between -.09 to
.01; differences in sensitivity ranged from -.30 to .02; and differences in specificity
ranged from -.11 to .03. On average, differences in classification rate (R2=.156)
decreased as the number of items, prevalence, and number of item categories decreased.
For both binary and five-category items, there were greater differences in classification
rates across method in conditions with 10 items than in conditions with 30 items.
Differences in sensitivity (R2=.270) increased as sample size increased, and differences in
sensitivity decreased as prevalence, number of items, and number of item categories
increased. For conditions with 10 items, the difference in sensitivity increased as sample
size increased, and the difference in sensitivity decreased as prevalence and number of
categories increased. In most conditions with 30 items or conditions with five-category
items, random forest had higher sensitivity than data-generating theta. On average,
differences in specificity (R2=.140) decreased as prevalence decreased. Similar to
classification rates, random forest had higher specificity than data-generating theta in
conditions 10 binary items, prevalence of .05, sample size of 500 or 1,000. For conditions
with 30 items, specificity differences decreased as the number of item categories
increased. For conditions with binary items, differences in specificity decreased as the
number of items increased.
Machine learning models vs estimated theta. The vast majority of conditions
had higher classification accuracy for estimated theta than the logistic regression model.
Differences in classification between logistic regression and estimated theta ranged
between -.10 to .02; differences in sensitivity ranged from -.28 to .03; and differences in
specificity ranged from -.10 to .03. Although there were significant differences in
47
classification rate (R2=.040) and specificity (R2=.028) across simulation factors, none of
the predictors had a partial η2 > .01. For sensitivity (R2=.064), differences in sensitivity
increased as the number of items increased in conditions with low prevalence and small
sample size.
Also, the vast majority of conditions had higher classification rate for estimated
theta than for the relaxed Lasso logistic regression model. Differences in classification
rate between relaxed Lasso and estimated theta ranged from -.06 to .00; sensitivity
differences ranged from -.01 to .03; and specificity differences ranged from -.06 to .00.
Differences in classification rate (R2=.054) and specificity (R2=.037) decreased as sample
size and prevalence increased. Classification rate differences also decreased as number of
items increased. None of the predictors of difference in sensitivity (R2=.010) had a partial
η2 > .01.
Similar to the results of data-generated theta, the vast majority of conditions had
higher classification rates for the estimated theta model than the random forest model,
except in conditions with ten binary items, prevalence of .05, and sample size of 500 or
1,000. Differences in classification rate between the random forest model and estimated
theta ranged between -.12 to .20; differences in sensitivity ranged from -.42 to .05; and
differences in specificity ranged from -.13 to .24. On average, differences in classification
rate (R2=.167) decreased as the diagnosis-test correlation, number of items, prevalence,
and number of item categories increased. For binary items, conditions with low
prevalence and 10 items favored random forest compared to the estimated thetas.
Differences in sensitivity (R2=.256) decreased as prevalence, sample size, number of
items, and number of item categories increased. In conditions with 10 binary items and
48
low prevalence, differences increased as sample size increased. In most conditions with
30 items or conditions with five-category items, random forest had a slightly higher
sensitivity than the estimated thetas. On average, differences in specificity (R2=.151)
decreased as prevalence, number of items, number of item categories, and sample size
increased. Similar to classification rates, random forest had higher specificity than
estimated theta in conditions 10 binary items, prevalence of .05, sample size of 500 or
1,000. For conditions with 30 items, specificity differences decreased as the number of
item categories increased. For conditions with binary items, differences in specificity
decreased as the number of items increased.
3.6. Scoring Machine Learning Items for Person Parameter Recovery
The CART algorithm and Lasso logistic regression select the most important
items in the prediction of the diagnosis. This section investigates the recovery of data-
generating theta when the algorithms selected at least two items, and then the items were
scored using the estimated item parameters from the IRT model in section 3.2. On the
other hand, random forest does not do variable selection, so parameter recovery was
studied by scoring half of the items with the highest variable importance. Results should
be interpreted conditional on how many items the algorithm selected. More information
can be found in appendix F, section 7.
Person Parameter Recovery by CART. Only conditions with prevalence greater
than 5% and a sample size greater than 250 were analyzed because those conditions were
likely to assign cases to the minority class. According to variable importance measures,
important predictors of theta MSE and the correlation between true and estimated theta
were the diagnosis-test correlation and number of items. The CART algorithm would
49
choose almost all of the items in conditions with low diagnosis-test correlation and low
number of items. Therefore, in those conditions theta MSE is close to zero and the
diagnosis-test correlation is close to 1. Conditions with a diagnosis-test correlation of .70
would chose between 2-7 items, and the correlation between data-generating and
estimated theta was between .62-.87.
Person Parameter Recovery by Lasso. In contrast to CART, Lasso selected
more items as the diagnosis-test correlation increased. Only conditions with a diagnosis-
test correlation of .70 were analyzed because those conditions were likely to assign cases
to the minority class. According to variable importance measures, important predictors of
theta MSE and the correlation between data-generating and estimated theta were sample
size, prevalence, number of items, and number of item categories. On average, MSE
decreased and the correlation between data-generating and estimated theta increased as
sample size, prevalence, and number of item categories increased, and as the number of
items decreased. The Lasso chose between 2-7 items in the 10 item condition and
between 7-13 items for the 30 item condition, and the correlation between data-
generating and estimated theta was between .70-.96.
Person Parameter Recovery by Random Forest. According to variable
importance measures, important predictors of theta MSE and the correlation between
data-generating and estimated theta were number of items and number of item categories.
On average, theta MSE decreased and the correlation between data-generating and
estimated theta increased as the number of items and number of categories increased. The
correlation between data-generating and estimated theta ranged between .80 and .91
50
when the 5 (out of 10) most important items were scored, and the correlation ranged
between .86 and .95 when the 15 (out of 30) most important items were scored.
4. Discussion
Diagnostic assessments are important in psychology because they are cheaper and
less invasive than a gold standard that dictates diagnoses. After an assessment has been
administered, it is important to determine if the case should be classified as diagnosed or
not. This simulation compared methods for diagnostic classification using psychometric
and machine learning approaches. Psychometric methods predict diagnoses using either
an estimated IRT score or raw summed score from assessment items. Then, cut scores for
psychometric models are determined by receiver operating characteristic (ROC) curves.
The machine learning methods used in this simulation were classification and regression
trees, random forest, logistic regression, Lasso logistic regression, and relaxed Lasso
logistic regression. The machine learning methods predict the probability of diagnosis
from item responses, so they assign a diagnosis either by looking at the class that is most
probable to reduce prediction error (Bayes classifier), or using ROC curves to find a
probability threshold to assign diagnoses (ROC classifier).
Overall, this study suggests that the important factors for classification accuracy
in psychometric models are the diagnosis-test correlation and prevalence of the diagnosis.
Machine learning models with a Bayes classifier had high specificity and very low
sensitivity across conditions. The low sensitivity across conditions suggests that these
methods should not be used in practice. However, machine learning models with a ROC
classifier had comparable classification accuracy to the psychometric models as the
number of items, number of item categories, and sample size increased. Therefore, results
51
suggest that machine learning approaches could be a viable alternative to psychometric
models for diagnostic assessments. Next, the main conclusions and limitations from this
study are discussed in reference to the study hypotheses.
4.1 Revisiting Study Hypotheses
Hypothesis 1. It was hypothesized that classification accuracy would increase as
sample size, number of items, number of item categories, prevalence of the diagnosis, and
the diagnosis-test correlation increased. Hypothesis 1 is largely supported. The results
suggest that classification accuracy in psychometric models depend on the diagnosis-test
correlation and prevalence, independent of other simulation factors. For machine learning
models with a ROC classifier, classification accuracy in logistic regression mostly
depended on the diagnosis-test correlation. Classification accuracy of the random forest
algorithm increased as the diagnosis-test correlation, number of items, and number of
item categories increased. In other words, classification accuracy improved in the random
forest algorithm when the items provided more candidate splits. Classification accuracy
of relaxed Lasso logistic regression and Lasso logistic regression were not significantly
different, and models had higher than a 50% chance of assigning cases to the minority
class only in conditions with a diagnosis-test correlation of .70. Classification accuracy of
relaxed Lasso and Lasso logistic regression increased as sample size, number of items,
and number of item categories increased, and as prevalence decreased. Diagnosis-test
correlation was an important predictor for both random forest and logistic regression, so
it is not surprising that the simulation factors explain less variance for both of the Lasso
models than for the random forest and logistic regression models. Finally, machine
learning algorithms using the Bayes classifier were more likely to predict the minority
52
class as the simulation factors increased, even though classification accuracy was poor
(discussed in Hypothesis 3).
Hypothesis 2. It was hypothesized that, on average, machine learning methods
would have higher classification accuracy than psychometric methods in conditions
where there are violations of psychometric assumptions (such as local dependence) and
small sample sizes because the IRT θ scores might not be estimated accurately. This
hypothesis could only be partially tested because classification accuracy was not
significantly affected in conditions with violations of local independence. Although local
dependence affected item parameter recovery, local dependence was not a significant
predictor of classification accuracy in any model. A future direction would be to
investigate the effect of larger violations of local dependence or investigate other model
misspecifications of the IRT model (further discussed below). In conditions with small
sample sizes, most machine learning models with the Bayes classifier did not assign cases
to the minority class. On the other hand, there were small sample conditions where
machine learning models with ROC classifiers outperformed data-generating theta in
classification accuracy. These conditions also had prevalence of 5%. For example, there
was a .011 to .012 difference in specificity favoring relaxed Lasso logistic regression in
conditions with 30 five-category items. Also, there were conditions with either 10 binary
items, 5% prevalence, and small sample size or 10 binary items, 5% prevalence, and a
diagnosis-test correlation of .30 and .50 where differences in specificity favored logistic
regression. Finally, conditions with 10 five-category items with low diagnosis-test
correlation had differences in sensitivity that favored the random forest algorithm (largest
differences ranged from .03 to .04). This is also found in conditions with 10 binary items,
53
but only when there is high prevalence. For conditions with 30 items and a diagnosis-test
correlation of .30, all conditions slightly favored random forest.
There are several future directions for this research. First, the differences in
classification accuracy were small. It would be interesting to investigate if the difference
between classification accuracy for each combination of simulation factors is statistically
significant or just chance. Significance tests could either use an analytical standard error
of the difference in classification indices or use bootstrapped confidence intervals for
classification indices to see if they include zero. A larger number of replications per
condition could also lead to greater precision of the difference in classification accuracy.
Moreover, results suggest that further simulations should focus on comparing these
methods with small sample sizes, where machine learning models could have a
hypothetical advantage. However, there is a possibility that the results with small samples
are not reliable because different procedures were taken to get the models to work. As
mentioned in the Methods section, conditions with either a small sample size or small
prevalence used five-fold cross-validation (instead of ten-fold cross-validation) and
stratified sampling to guarantee that each fold had both diagnosed and non-diagnosed
cases. There is a possibility that using fewer folds might reduce the variability in the
classification accuracy results. An alternative approach would be to use leave-one-out
cross-validation for models with a small sample size or low prevalence. Also, there is a
possibility that machine learning algorithms could over-predict the majority class in
conditions with small sample size or low prevalence, thus leading to higher classification
rates and specificity than in the psychometric methods. Therefore, sensitivity might be a
better classification accuracy index to evaluate models with small sample sizes and low
54
prevalence. In this case, random forest had higher sensitivity than the data-generating
model in conditions with more pieces of information (more predictors, and each predictor
with many candidate splits). Overall, it would be interesting for other simulations to
evaluate conditions with small sample sizes and low prevalence.
Hypothesis 3. It is hypothesized that psychometric methods would have higher
classification accuracy than machine learning methods when the prevalence is low. That
is, psychometric methods balance sensitivity and specificity of the diagnosis, while the
machine learning methods might just predict that the lowest occurring class never
happens. Hypothesis 3 is largely supported. For the machine learning algorithms with a
Bayes classifier, specificity was inflated and sensitivity was too low. This is not
surprising– in conditions with low prevalence (5%), the Bayes classifier would predict
that there should not be a diagnosis, and the algorithm would be correct 95% of the time.
Potential improvements to the machine learning models with Bayes classifiers are
discussed in the section below. When the machine learning models used a ROC classifier,
there was a balance between specificity and sensitivity, so classification accuracy was
within the range of data-generating theta (See Table 1 and Hypothesis 2).
Hypothesis 4. Within the machine learning methods, it is hypothesized that the
random forest algorithm might be the best performing classifier because of the diversity
in the ensemble used to make a joint decision. There is evidence to support this
hypothesis. As previously mentioned, Lasso logistic regression and the relaxed Lasso
logistic regression with ROC classifiers were likely to assign cases to the minority class
only when there was a diagnosis-test correlation of .70. On the other hand, the random
forest algorithm with a ROC classifier worked for all of the conditions. Conditions with
55
30 five-category items were likely to favor the random forest algorithm over the logistic
regression methods. In other words, the random forest algorithm performs better when
the predictors have more pieces of information.
Hypothesis 5. It is also hypothesized that, as more items are chosen by the
machine learning algorithms, the items would recover the true IRT θ score. There is
evidence to support this hypothesis – although it is obvious that more items should lead
to acceptable recovery. Perhaps the hypothesis should have been that by scoring the
remaining items chosen by the machine learning algorithms, the true IRT θ score would
be recovered and that there will not be ceiling or floor effects. Perhaps the interesting
result is that, for CART, conditions with low diagnosis-test correlation would select more
items to predict the diagnosis, which increases parameter recovery. As diagnosis-test
correlation increases, fewer items are chosen, but parameter recovery is still acceptable.
For the Lasso, recovery of the person parameter with 2-13 items seemed acceptable.
Results from random forest suggest that using half of the items (either 5 or 15) recovers
theta well.
4.2 Limitations and Future Directions
The section is broken into three parts. The first part covers ways in how this
simulation study could be improved. The second part discusses direct extensions of this
simulation study. Finally, the third part describes external factors that influence the
simulation and potentially influence the classification accuracy of machine learning and
psychometric models.
Improving the simulation. There were several problems with the simulation.
One of the goals of the simulation was to study the influence of violating psychometric
56
model assumptions on classification accuracy, specifically the influence of violating the
local independence assumption. In this study, surface local dependence was introduced
by constraining 30% of the sample to have identical responses to first five items,
regardless of the item parameters and latent variable score. The violation of local
independence affected item parameter recovery during IRT estimation, but it did not
influence classification accuracy across psychometric or machine learning models.
Therefore, a future direction would be to include a stronger manipulation of local
dependence or examine underlying local dependence by introducing unmodeled
dimensions to the item responses. Furthermore, the classification accuracy between the
estimated IRT scores and the summed scores were not significantly different from each
other. In this case, estimated IRT scores and summed scores might have led to similar
classification accuracy because both were highly correlated and the items were highly
reliable. Therefore, a future direction would be to investigate how classification accuracy
changes as a function of summed score reliability. It is important to note that estimated
theta scores should be preferred over summed scores because, in expectation, they should
approach the data-generating theta and could be used for linking item responses when
cases do not take the same items. In the same vein, the distribution of item parameters
mimicked those found in traditional assessments, but this simulation made sure that in
each replication there were at least five responses per category to prevent collapsibility of
item responses. Collapsibility may influence the estimation of the IRT model and the
recovery of the person parameter, so a future direction would be to use a more realistic
approach to simulate item responses. Also, the machine learning algorithms with a Bayes
classifier were not likely to assign cases to the minority class. Potential future directions
57
to improve the results of the Bayes classifier would be to introduce item weights to the
algorithms to differentially weight false positive and false negatives so that the majority
class is not over-predicted. Another potential future direction would be to look at the test
information function from the assessment items to investigate if there is information close
to the cut score. If there is not a lot of precision around the cut score, then there is a
possibility that the machine learning models would not chose items to predict the
diagnosis. Similarly, more reliable predictors, such as a total score or item parcels, could
be included in the machine learning models as additional predictors to increase the
chances of choosing a predictor. In this case, item parcels may be preferred over total
scores to take advantage of the variable selection property of machine learning
algorithms. A potential problem is that CART is biased towards selecting predictors with
a lot of unique values to split on. A potential solution would be to use the conditional
trees algorithm to prevent the CART selection bias (Strobl et al., 2009). Overall,
performance of machine learning algorithms in the presence of both item responses and
item parcels still has to be evaluated.
Direct extensions of the simulation. The simulation in this study could be
extended by either adding other data-generating models or using other algorithms for
diagnostic classification. The data-generating model in this study was a psychometric
model, but the dataset could have been simulated so that it favored the machine learning
approaches. For example, the main predictors of the diagnosis could have been some, but
not all, of the items or a complex interaction among the items. Also, model error could
have been introduced to the data-generating model so that psychometric models were not
substantially favored. Furthermore, other models could have been used to generate the
58
data so that they would not favor either the IRT or machine learning models, such as a
diagnostic classification model (DCM) or a model with causal indicators. Finally,
alternative models could have been used to predict the diagnosis. For example,
computerized adaptive testing and computerized adaptive diagnosis could have been used
to estimate the latent variable score of the participants while also selecting less items than
those found in the assessment. Boosting, a supervised learning algorithm that uses binary
recursive partitioning to fit trees to residual structures, could also have been used to
predict the diagnosis from item responses. K-means clustering, an unsupervised learning
algorithm that uses distance measures to find grouping structures in a dataset, could have
been used to investigate if it could differentiate between the diagnosed and non-
diagnosed groups.
External factors influencing the simulation. There are additional factors in the
simulation that could have influenced the diagnostic classification of machine learning
and psychometric methods. For example, it would be interesting to study the influence of
missing data on diagnostic classification. Psychometric methods could deal with missing
data by using full information maximum likelihood. On the other hand, missing data
handling varies by machine learning algorithm. Also, it would be interesting to study the
influence of class imbalance corrections in classification accuracy across methods. There
are approaches in machine learning to overcome class imbalance problems, such as
oversampling the minority class, undersampling the majority class, or creating synthetic
cases of the minority class using nearest-neighbor algorithms (SMOTE). It is
hypothesized that using some of the approaches to overcome class imbalance should lead
to higher classification accuracy for the machine learning algorithms with a Bayes
59
classifier. Finally, the influence of reducing assessment length on the classification
accuracy of psychometric models should be investigated. In this study, a lot of the
machine learning models chose less items to predict the diagnosis than the items
available, while the psychometric models used all of the items. A psychometric approach
to create static short-forms would be to chose a set of items that maximize item
information in a certain range across the latent variable or to maximize item information
close to a cut score. After items are selected, IRT scores could be estimated and
compared to a cut score to determine diagnosis. As previously mentioned, another
approach to reduce the number of administered items is to use computerized adaptive
testing, where administered items are tailored to the participant depending on their
previous responses.
4.3 Conclusion
The results of this study suggest that there is a possibility that machine learning
algorithms using a ROC classifier could be used for diagnostic assessment, and that they
could provide a viable alternative to the psychometric models. However, it is important to
consider the implications for assessment by taking either a psychometric and a machine
learning approach to diagnostic classification. The psychometric approach focuses on
improving the measurement of the construct in the assessment by introducing a latent
variable. Therefore, the model used for classification could be generalizable due to
theory. However, there might be estimation problems when there are violations of model
assumptions. In this case, the psychometric approach uses all of the items to estimate the
latent variable, and then a cut score is determined to balance sensitivity and specificity.
So, there is an indirect prediction of the diagnosis because the diagnosis is not part of the
60
measurement process. On the other hand, machine learning builds a model to predict the
probability of diagnosis. Therefore, there is a direct prediction of the diagnosis and the
model is outcome-specific. Consequently, it is difficult to use the framework for
inference, but it might be more robust to violations of psychometric assumptions.
Keeping these goals of diagnostic assessment in mind could help researchers decide what
approach to take, and the results of this simulation could suggest under what
circumstances each model could perform well. Finally, perhaps the most important
finding of this study is that there is a lot of overlap between psychometrics and machine
learning, so understanding advantages and disadvantages from each of the models could
help each of these fields learn from the other.
61
References
Achenbach, T., & Rescorla, L. (2013). Achenbach system of empirically based
assessment. In Encyclopedia of autism spectrum disorders (pp. 31-39). Springer New
York.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and
regression trees. Wadsworth and Brooks.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s
ability. In F. M. Lord and M. R. Novick, Statistical theories of mental test scores.
Reading, MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item
parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.
Chalmers, R. P., (2012). mirt: A Multidimensional Item Response Theory Package for
the R Environment. Journal of Statistical Software, 48(6), 1-29.
de Ayala, R. J. (2009). The theory and practice of item response theory. New York:
Guilford Press.
Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to
questionnaire development, evaluation, and refinement. Quality of Life Research, 16, 5-
18.
Edwards, M. C., Houts, C. R., & Cai, L. (In Press). A Diagnostic Procedure to Detect
Departures from Local Independence in Item Response Theory Models. Psychological
Methods.
Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic
Press.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized
Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
Gibbons, R. D., Hooker, G., Finkelman, M. D., Weiss, D. J., Pilkonis, P. A., Frank, E.,
Moore, T., & Kupfer, D. J. (2013). The CAD-MDD: a computerized adaptive diagnostic
screening tool for depression. The Journal of Clinical Psychiatry, 74, 669-674.
Gibbons, R. D., Weiss, D. J., Frank, E., & Kupfer, D. (2016). Computerized adaptive
diagnosis and testing of mental health disorders. Annual review of clinical
psychology, 12, 83-104.
62
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical
learning. New York: Springer.
Levy, R., & Mislevy, R. J. (2016). Bayesian psychometric modeling. CRC Press.
Lewinsohn, P. M., Seeley, J. R., Roberts, R. E., & Allen, N. B. (1997). Center for
Epidemiologic Studies Depression Scale (CES-D) as a screening instrument for
depression among community-residing older adults. Psychology and Aging, 12, 277-287.
Liaw, A. & Wiener, M. (2002). Classification and Regression by randomForest. R News,
2, 18-22.
Liu, X. (2012). Classification accuracy and cut point selection. Statistics in
medicine, 31(23), 2676-2686.
Lu, F., & Petkova, E. (2014). A comparative study of variable selection methods in the
context of developing psychiatric screening instruments. Statistics in Medicine, 33(3),
401-421.
McArdle, J. (2013b). Adaptive testing of the Number Series Test using standard
approaches and a new decision tree analysis approach. In J. McArdle & G. Ritschard
(Eds.), Contemporary issues in exploratory data mining in the behavioral sciences (pp.
312-344). New York: Routledge.
McFall, R. M., & Treat, T. A. (1999). Quantifying the information value of clinical
assessments with signal detection theory. Annual Review of Psychology, 50(1), 215-241.
Meinshausen, N. (2007). Relaxed lasso. Computational Statistics & Data Analysis, 52,
374–393. doi:10.1016/j.csda.2006.12.019
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.
Ripley, B. (2016). tree: Classification and Regression Trees. R package version 1.0-37.
https://CRAN.R-project.org/package=tree
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J-C., & Müller, M.
(2011). pROC: an open-source package for R and S+ to analyze and compare ROC
curves. BMC Bioinformatics, 12, p. 77.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded
scores. Psychometrika Monograph No. 17, 34 (4, Pt. 2)
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning:
Rationale, application, and characteristics of classification and regression trees, bagging,
and random forests. Psychological Methods, 14, 323-348.
63
Thissen D., Orlando, M. (2001) Item Response Theory for items scored in two categories.
In D. Thissen and H. Wainer, (Eds), Test scoring (pp. 73-140). Mahwah: Lawrence
Earlbaum Associates, Publishers.
Thissen, D., & Steinberg, L. (2009). Item response theory. In Millsap, R. E., & Maydeu-
Olivares, A. (Eds.). The Sage handbook of quantitative methods in psychology (pp. 148-
177).
Thissen, D., & Wainer, H. (Eds.). (2001). Test scoring. Mahwah: Lawrence Erlbaum
Associates, Publishers.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the
Royal Statistical Society. Series B (Methodological), 267-288.
van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation
adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive
testing (pp. 3–30). New York: Springer.
Yan, D., Lewis, C., & Stocking, M. (2004). Adaptive testing with regression trees in the
presence of multidimensionality. Journal of Educational and Behavioral Statistics, 29,
293-316.
Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.)
Educational Measurement, 4th Edn., (pp. 111-153). Westport, CN: Greenwood
Publishing Group.
Youngstrom, E. A. (2013). A primer on receiver operating characteristic analysis and
diagnostic efficiency statistics for pediatric psychology: we are ready to ROC. Journal of
Pediatric Psychology, 39, 204-221.
Youngstrom, E. A., Frazier, T. W., Demeter, C., Calabrese, J. R., & Findling, R. L.
(2008). Developing a ten item mania scale from the Parent General Behavior Inventory
for children and adolescents. The Journal of Clinical Psychiatry, 69, 831-839.
Zhou, X. H., McClish, D. K., & Obuchowski, N. A. (2011). Statistical methods in
diagnostic medicine. New York: John Wiley & Sons.
Zou, K. H., O’Malley, A. J., & Mauri, L. (2007). Receiver-operating characteristic
analysis for evaluating diagnostic tests and predictive models. Circulation, 115, 654-657.
Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: a
fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39, 561-577.
64
APPENDIX A
FLOWCHART OF THE DATA-GENERATING PROCEDURE
65
66
APPENDIX B
TABLES
67
Table 1
Median of classification accuracy indices by model
Classification Rate Sensitivity Specificity
Prevalence .05 .10 .20 .05 .10 .20 .05 .10 .20
Data-generating Theta .73 .71 .68 .74 .72 .71 .73 .71 .69
Estimated Theta .72 .69 .67 .73 .72 .70 .72 .69 .67
Raw Summed Score .71 .69 .67 .74 .72 .70 .71 .68 .67
CART ^ - - .74 - - .26 - - .86
RF Bayes classifier - - .80 - - .14 - - .96
Lasso Logistic
Bayes classifier % - - .82 - - .16 - - .99
Relaxed Lasso Logistic
Bayes classifier # .95 .90 .82 .13 .21 .37 .99 .98 .94
Logistic Regression
Bayes classifier @ .94 .89 .79 .05 .08 .19 .98 .98 .94
RF ROC classifier .67 .65 .64 .71 .71 .70 .67 .64 .64
Lasso Logistic
ROC classifier*** .74 .73 .72 .81 .79 .76 .74 .72 .71
Relaxed Lasso
ROC classifier*** .75 .73 .73 .81 .79 .76 .74 .73 .72
Logistic Regression
ROC classifier .67 .66 .65 .72 .71 .69 .67 .66 .65
Note: (***) Only for conditions with diagnosis-test correlation of .70. For data-generating thetas,
classification rates were [.80, .77, .75], sensitivity were [.83, .81, .78], and specificity were [.80,
.77, .75].
^ is only for conditions greater than N=250 % is for conditions with five-category items, diagnosis-test correlation of .70, and sample size
greater than N=250
# is for conditions with five-category items and diagnosis-test correlation of .70 @ is for conditions with thirty items
- is for conditions not analyzed
68
APPENDIX C
FIGURES
69
Fig
ure
1. V
aria
nce
ex
pla
ined
and
unco
ndit
ional
η2 e
ffec
t si
zes
for
the
pre
dic
tors
of
clas
sifi
cati
on
accu
racy i
n m
achin
e le
arnin
g m
odel
s usi
ng R
OC
cla
ssif
iers
. N
ote
: N
eith
er l
oca
l dep
end
ence
or
four/
five-
way i
nte
ract
ions
had
par
ial-
η2 >
.010 m
odel
. B
lank c
ells
had
cond
itio
nal
pre
dic
tors
wit
h
par
ial-
η2 <
.010.
Colo
n (
: )
is
for
inte
ract
ions.
Ora
nge
are
pre
dic
tors
not
incl
uded
in t
he
model
. cr
=
clas
sifi
cati
on r
ate;
se=
sensi
tivit
y;
sp=
spec
ific
ity;
rex
=re
lax
ed l
asso
; lo
g=
logis
tic
regre
ssio
n;
nit
em=
num
ber
of
item
s; n
cat=
num
ber
of
item
cat
ego
ries
t; c
or=
dia
gnosi
s-te
st c
orr
elat
ion;
ss=
sam
ple
siz
e; p
rev=
pre
val
ence
70
Figure 2. Random Forest Variable Importance Measures for machine learning algorithms
with ROC classifiers. Top row is relaxed lasso logistic regression (only for conditions
with diagnosis-test correlation of .70), middle row is random forest, and bottom row is
logistic regression.
Classification rate Sensitivity Specificity
71
APPENDIX D
R SYNTAX TO GENERATE DATA
72
library(MASS)
library(mirt)
library(stringr)
#Data
N=c(1000,2000,3000) #sample size
tcor=c(.7,.8,.9) #corr
prev=c(.2,.3) #prevalence
nCat <- c(2,5) #number of categories
ld <-c(0,.3)
rep<- 50 #replications
n <- 50 #Number of items
nn1=str_pad(1:rep,3,pad='0')
#-----------------------------------
#ENVIRONMENT SET
for(ii in 1:length(N)){
for(jj in 1:length(tcor)){
for(kk in 1:length(prev)){
for(ll in 1:length(nCat)){
for(mm in 1:length(ld)){
dir1=paste0('n',N[ii],collapse='_')
dir2=paste0('cor',tcor[jj],collapse='_')
dir3=paste0('prev',prev[kk],collapse='_')
dir4=paste0('cat',nCat[ll],collapse='_')
dir5=paste0('ld',ld[mm],collapse='_')
dir6=paste0("D:/simWorld/Oscar_diss/",dir1,"/",dir2,"/",dir
3,"/",dir4,"/",dir5)
dir.create(dir6,recursive=T,showWarnings=FALSE)
#}}}}
setwd(dir6)
#=====================================
#====================================
for(rr in 1:rep){
#TRUE Theta
p1=mvrnorm(N[ii],rep(0,2),Sigma=matrix(c(1,tcor[jj],tcor[jj
],1),ncol=2,nrow=2))
colnames(p1)=c('test','diag_true')
73
p2=data.frame(p1)
cut1=qnorm(1-prev[kk],lower.tail=T)
p2$diag=ifelse(p2$diag_true<cut1,0,1)
#TRUE Theta
theta <- p2$test
if(nCat[ll]==2){
#2PL
a.1<-rnorm(n,1.7,.3)
b.1<-rnorm(n,0,1)
p <- matrix(0,N[ii],n)
u <- matrix(NA,N[ii],n)
rsave=matrix(NA,N[ii],n)
for (i in 1:N[ii]) {
for (j in 1:n) {
#Draw a random number to determine
categories
r <- runif(1, 0, 1)
rsave[i,j]=r
p[i, j] <- 1 / (1 + exp(-
a.1[j] * (theta[i] - b.1[j])))
if (r <= p[i, j]) {
u[i,j] <- 1
} else
{u[i,j] <- 0}
}}
colnames(u)=paste0("V",1:n)
truepar=cbind(a.1,b.1)
write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'),row
.names=FALSE)
#surface local dependence
if(ld[mm]==0){
p3=cbind(p2,u)
write.csv(p3,file=paste0('rep',nn1[rr],".csv"))
truepar=cbind(a.1,b.1)
write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'),row
74
.names=FALSE)
}
else{
ld_prop=N[ii]*ld[mm]
u[1:ld_prop,1:5]=u[1:ld_prop,1]
p3=cbind(p2,u)
write.csv(p3,file=paste0('rep',nn1[rr],".csv"))
truepar=cbind(a.1,b.1)
write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'))
}
}
###############
##CATEGORICAL##
###############
else{
#GRM
#Fixed item parameters
a.1 <- rlnorm(n, .25, .5)
b.1 <- matrix(0, n, (nCat[ll] - 1))
###Generate threhold parameters for b.1 GRM with 10 items
there are 4 thresholds for 5 categories
###Using this method is for b dist as N(0,1) since b.1[,1]
mean is -.6
###and b.1[,2] to b.1[,4] are all .2
b.1[, 1] <- rnorm(n, -.6, 1)
for(j in 1:n) {
75
b.1[j, 2] <- b.1[j,1] + runif(1, .5, .9)
b.1[j, 3] <- b.1[j,2] + runif(1, .5, .9)
b.1[j, 4] <- b.1[j,3] + runif(1, .5, .9)
}
### 5 category item responses
p <- array(0,c(N[ii],n,nCat[ll]))
pstar <- array(1,c(N[ii],n,nCat[ll]))
u <- matrix(0,N[ii],n)
for (i in 1:N[ii]) {
for (j in 1:n) {
#Draw a random number to determine
categories
r <- runif(1, 0, 1)
for (k in 2:nCat[ll]) {
pstar[i, j, k] <- 1 / (1 +
exp(-a.1[j] * (theta[i] - b.1[j, (k-1)])))
p[i,j,(k-1)] <- pstar[i, j,
(k-1)] - pstar[i, j, k]
}
p[i, j, nCat[ll]] <- pstar[i, j, 5]
#probability of last category or higher is that category
if (r <= p[i, j, 1]) {
u[i,j] <- 1
} else
if (r <= p[i,j,1] +
p[i,j,2]) {
u[i,j] <- 2
} else
if (r <= p[i,j,1] +
p[i,j,2] + p[i,j,3]) {
u[i,j] <- 3
} else
if (r <=
p[i,j,1] + p[i,j,2] + p[i,j,3] + p[i,j,4]) {
u[i
,j] <-4
76
} else
if
(r <= 1) {
u[i,j] <-5
}
}
}
colnames(u)=paste0("V",1:n)
truepar=cbind(a.1,b.1)
write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'),row
.names=FALSE)
#surface local dependence
if(ld[mm]==0){
p3=cbind(p2,u)
write.csv(p3,file=paste0('rep',nn1[rr],".csv"))
truepar=cbind(a.1,b.1)
write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'),row
.names=FALSE)
}
else{
ld_prop=N[ii]*ld[mm]
u[1:ld_prop,1:5]=u[1:ld_prop,1]
p3=cbind(p2,u)
write.csv(p3,file=paste0('rep',nn1[rr],".csv"))
truepar=cbind(a.1,b.1)
write.csv(truepar,file=paste0('truepar',nn1[rr],'.csv'),row
.names=FALSE)
}
}
}
}}}}}
setwd("D:/simWorld/")
77
APPENDIX E
R SYNTAX TO ANALYZE DATA
78
library(pROC)
library(mirt)
N=c(1000,2000,3000) #sample size
tcor=c(.7,.8,.9) #corr
prev=c(.2,.3) #prevalence
nCat <- c(2,5) #number of categories
ld <-c(0,.3)
rep<- 1 #replications
n <- 50 #Number of items
counter=0
direct=list(NULL)
#-----------------------------------
#ENVIRONMENT SET
for(ii in 1:length(N)){
for(jj in 1:length(tcor)){
for(kk in 1:length(prev)){
for(ll in 1:length(nCat)){
for(mm in 1:length(ld)){
counter=1+counter
dir1=paste0('n',N[ii],collapse='_')
dir2=paste0('cor',tcor[jj],collapse='_')
dir3=paste0('prev',prev[kk],collapse='_')
dir4=paste0('cat',nCat[ll],collapse='_')
dir5=paste0('ld',ld[mm],collapse='_')
dir6=paste0("D:/simWorld/Oscar_diss/",dir1,"/",dir2,"/",dir
3,"/",dir4,"/",dir5)
dir.create(dir6,recursive=T,showWarnings=FALSE)
direct[[counter]]=dir6
}}}}}
direct2=do.call(rbind,direct)
#---------------------------------
##READ DATA##
for(q in 1:3){
79
#for(q in 4:length(direct2)){
setwd(direct2[q])
listf1=list.files(pattern="rep")
datamat=list(NULL)
for(qq in 1:length(listf1)){
datamat[[qq]]=read.csv(listf1[qq],header=TRUE)
}
listpars=list.files(pattern="truepar")
datapar=list(NULL)
for(qq in 1:length(listpars)){
datapar[[qq]]=read.csv(listpars[qq],header=TRUE)
}
#--------------------------------------
sum1=list(NULL)
counter2=0
for(nn in 1:length(datamat)){
data1=data.frame(nn)
counter2=counter2+1
training=datamat[[nn]][1:(nrow(datamat[[nn]])/2),]
testing=datamat[[nn]][((nrow(datamat[[nn]])/2)+1):nrow(data
mat[[nn]]),]
#TRUE Theta
p2=testing
n1=roc(p2$diag~p2$test)
n2=coords(n1,"all")
youden=n2[2,]+n2[3,]-1
cto1=sqrt((1-n2[2,])^2+(1-n2[3,])^2)
concor=n2[2,]*n2[3,]
c1=which(youden==max(youden))
c2=which(cto1==min(cto1))
c3=which(concor==max(concor))
c1=c1[sample(1:length(c1),1)]
c2=c2[sample(1:length(c2),1)]
80
c3=c3[sample(1:length(c3),1)]
testing$youdenpred=ifelse(testing$test<n2[1,c1],0,1)
testing$cto1pred=ifelse(testing$test<n2[1,c2],0,1)
testing$concorpred=ifelse(testing$test<n2[1,c3],0,1)
data1$tyouden=n2[1,c1]
data1$tcto1=n2[1,c2]
data1$tconcor=n2[1,c3]
#----------------------------------
#ESTIMATED THETA
library(mirt)
u=testing[,paste0("V",1:50)]
#remove constants
u=u[,apply(u, 2, var, na.rm=TRUE) != 0]
l1=mirt(u,1)
data1$conv=extract.mirt(l1,'converged')
if(extract.mirt(l1,'converged')==TRUE){
l2=fscores(l1)
testing$eap_mirt=c(l2)
n3=roc(testing$diag~testing$eap_mirt)
n4=coords(n3,"all")
estyouden=n4[2,]+n4[3,]-1
estcto1=sqrt((1-n4[2,])^2+(1-n4[3,])^2)
estconcor=n4[2,]*n4[3,]
c4=which(estyouden==max(estyouden))
c5=which(estcto1==min(estcto1))
c6=which(estconcor==max(estconcor))
c4=c4[sample(1:length(c4),1)]
c5=c5[sample(1:length(c5),1)]
c6=c6[sample(1:length(c6),1)]
testing$estyoudenpred=ifelse(testing$eap_mirt<n4[1,c4],0,1)
testing$estcto1pred=ifelse(testing$eap_mirt<n4[1,c5],0,1)
testing$estconcorpred=ifelse(testing$eap_mirt<n4[1,c6],0,1)
81
data1$col_u=ncol(u)
data1$eyouden=n2[1,c4]
data1$ecto1=n2[1,c5]
data1$econcor=n2[1,c6]
} else {
testing$eap_mirt=NA
testing$estyoudenpred=NA
testing$estcto1pred=NA
testing$estconcorped=NA
data1$col_u=ncol(u)
data1$eyouden=NA
data1$ecto1=NA
data1$econcor=NA}
############################
#item parameters
library(plyr)
par1=coef(l1,IRTpars=TRUE)
par1[[n+1]]<-NULL
par2=lapply(par1,as.data.frame)
par3=rbind.fill(par2)
datapar2<-cbind(datapar[[qq]],par3)
write.csv(datapar2,file=paste0("est_par",nn,".csv"))
############################
#----------------------------------
#ESTIMATE SUMSCORE
testing$sumscore=rowSums(u)
n5=roc(testing$diag~testing$sumscore)
n6=coords(n5,"all")
rawyouden=n6[2,]+n6[3,]-1
rawcto1=sqrt((1-n6[2,])^2+(1-n6[3,])^2)
rawconcor=n6[2,]*n6[3,]
c7=which(rawyouden==max(rawyouden))
c8=which(rawcto1==min(rawcto1))
c9=which(rawconcor==max(rawconcor))
82
c7=c7[sample(1:length(c7),1)]
c8=c8[sample(1:length(c8),1)]
c9=c9[sample(1:length(c9),1)]
testing$rawyoudenpred=ifelse(testing$sumscore<n6[1,c7],0,1)
testing$rawcto1pred=ifelse(testing$sumscore<n6[1,c8],0,1)
testing$rawconcorpred=ifelse(testing$sumscore<n6[1,c9],0,1)
data1$ryouden=n6[1,c7]
data1$rcto1=n6[1,c8]
data1$rconcord=n6[1,c9]
#-----------------------------
#TREE
#Decide on training/testing datasets
library(tree)
u2=training[,paste0("V",1:50)]
tr1=tree(as.factor(training$diag)~.,data=u2,control=tree.co
ntrol(nrow(u2),mindev=.0001))
#plot(tr1)
#text(tr1,pretty=0)
cvt=cv.tree(tr1,K=10)
#plot(cvt)
cvt2=which(cvt$dev==min(cvt$dev))
cvt2=cvt2[sample(1:length(cvt2),1)]
prtr=prune.tree(tr1,best=cvt$size[cvt2])
#plot(prtr)
#text(prtr,pretty=0)
#prevent single nodes
if(cvt$size[cvt2]>1){
tr2=predict(prtr,newdata=u)
testing$treepred=ifelse(tr2[,2]<.5,0,1)
} else{
testing$treepred=rep(0,nrow(testing))}
ll1=prtr$frame[,1]
est1=unique(as.numeric(gsub("\\D", "", ll1)))
83
est1=est1[!is.na(est1)]
tree_chose=matrix(0,nrow=1,ncol=50)
colnames(tree_chose)=paste0("treeV",1:50)
tree_chose[est1]=1
data1=cbind(data1,tree_chose)
#--------------------------
#RandomForests
library(randomForest)
rf1=randomForest(y=as.factor(training$diag),x=u2)
testing$rfpred=predict(rf1,newdata=u)
imp1=importance(rf1)
k1=sort(importance(rf1),decreasing=TRUE,index=TRUE)
k2=cbind(k1$ix,1:50)
rf_chose=matrix(0,nrow=1,ncol=50)
colnames(rf_chose)=paste0("rfV",1:50)
rf_chose[k2[,1]]=k2[,2]
data1=cbind(data1,rf_chose)
#---------------------------
#LOGISTIC LASSO
library(glmnet)
#glmmod<-
glmnet(x=as.matrix(u2),y=as.factor(training$diag),alpha=1,f
amily='binomial')
cv.glmmod <-
cv.glmnet(x=as.matrix(u2),y=as.factor(training$diag),
alpha=1,family='binomial')
plot(cv.glmmod,main=direct2[[q]])
best.lambda<-cv.glmmod$lambda.1se
lasso_prob <- predict(cv.glmmod,newx =
as.matrix(u2),s=best.lambda,type='response')
k1=ifelse(lasso_prob<.5,0,1)
testing$lassopred=c(k1)
l11=coef(cv.glmmod,s=best.lambda)
las_select=which(l11!=0)-1
las=las_select[-1]
84
lasso_chose=matrix(0,nrow=1,ncol=50)
colnames(lasso_chose)=paste0("lassoV",1:50)
lasso_chose[las]=1
data1=cbind(data1,lasso_chose)
#####
sum1[[nn]]=data1
write.csv(testing,file=paste0("testing",nn,".csv"))
}
sum2=do.call(rbind,sum1)
write.csv(sum2,file=paste0('summary',qq,'.csv'))
}
85
APPENDIX F
SUPPLEMENTAL WRITE-UP OF SIMULATION RESULTS
86
This section is divided into seven parts. The first part covers the estimation and
parameter recovery of the IRT models. The second part discusses the classification
accuracy of psychometric methods. The third part covers the estimation of the machine
learning models. The fourth part goes over the classification accuracy of the machine
learning models using the Bayes classifier (minimizing prediction error). The fifth part
discusses the classification accuracy of machine learning models using the ROC
classifier. The sixth part compares the psychometric and machine learning methods for
classification. Finally, the seventh part covers the recovery of the data-generating theta by
the items chosen by the machine learning models.
Clarifications. In this section, the term theta refers to the latent variable score
(person parameter) estimated in IRT analysis. The term classification accuracy refers to
classification rates, sensitivity, and specificity, jointly. However, each outcome is
analyzed individually using both linear regression and binary recursive partitioning. In
each analysis, there is a possibility that a model does not converge or that a model does
not assign cases to the minority class. Conditions analyzed were chosen to obtain a fully-
crossed design. At the beginning of the analysis of each model, there is a description of
the conditions analyzed for each model, and at the end of the section there are brief
conclusions about the analysis. Unless stated otherwise, regression models included all
possible predictors and predictor interactions for prediction. Only effects with partial-
η2>.01 were interpreted. The predictors were treated as categorical variables, so the
regression coefficients reported are for the group most different from the reference group.
For example, the predictor of diagnosis-test correlation has three values (.30, .50, .70),
87
and in this case the reference group in the model is .30 and the regression coefficient
reported is from the group of .70.
1. Estimation of the Psychometric Methods
IRT Convergence. There were 129 models (out of 108,000 models; 216
conditions, 500 replications per condition) that did not converge. Table 1.1 and Table 1.2
suggest that the number of models that did not converge come from conditions with
either binary items, ten items, or a sample size of 250.
Theta parameter recovery. Table 1.3 and Table 1.4 show the mean squared
error (MSE) between the data-generating theta and the estimated theta. On average, MSE
decreased as the number of categories increased. For conditions with five-category items,
MSE decreased as sample size increased. Table 1.5 and Table 1.6 suggest that the
correlation between the estimated theta and the data-generating theta increased as the
number of item categories and the number of items increased.
Item parameter recovery. Table 1.7, 1.8, 1.9, and 1.10 show the mean squared
error (MSE) and the variance explained between the true and estimated item parameters
for the 2PL model. MSE seemed to decrease and variance explained seemed increase as
sample size and number of items increased and local dependence decreased. Table 1.11,
1.12, 1.13, 1.14, 1.15, 1.16, 1.17, 1.18, 1.19, and 1.20 show the MSE and the variance
explained between the true and estimated item parameters for the GRM model. MSE
seemed to decrease and variance explained seemed increase as sample size and number
of items increased and local dependence decreased.
88
Brief Summary. Convergence and parameter recovery information suggest that
there were no apparent estimation problems. All of the conditions are used to compare
psychometric models.
2. Classification Accuracy of the Psychometric Models
Effect of ROC index. For the psychometric models, the cut scores in the ROC
analysis were determined by the Youden index, closest-to-(0,1) criterion, and the
concordance probability. However, the results from these three indices were not
practically different from each other. Regression models predicting the difference in
classification rate, sensitivity, and specificity from simulation factors did not explain a lot
of variance (R2=.002 - .009). Therefore, the results for the psychometric models are
presented based on the Youden index.
Classification accuracy with data-generating theta. Table 2.1 and 2.2 show
that classification rates ranged from .60 to .80; Table 2.3 and 2.4 show that sensitivity
ranged from .59 to .83; and Table 2.5 and 2.6 show that specificity ranged between .59 to
.80. Classification accuracy seems to increase as the diagnosis-test correlation increased.
Linear Regression. In a regression predicting classification rate of the data-
generating theta from number of items, number of item categories, diagnosis-test
correlation, prevalence, sample size, and local dependence, the variance explained by the
predictors was R2=.387. Classification rates increased as the diagnosis-test correlation
increased (b=.177, s.e. =.005, t=34.429, p<.001, partial-η2=.375) and as the prevalence
decreased (b=-.024, s.e.=.005, p=-4.670, partial-η2=.029). In a regression predicting
sensitivity from the simulation factors, the variance explained by the predictors was
R2=.317. Sensitivity increased as the diagnosis-test correlation increased (b=.185, s.e.
89
=.007, t=26.710, p<.001, partial-η2=.312). In a regression predicting specificity from the
simulation factors, the variance explained by the predictors was R2=. 277. Specificity
increased as the diagnosis-test correlation increased (b=.176, s.e. =.007, t=27.194,
p<.001, partial-η2=.264) and as the prevalence decreased (b=-.025, s.e.=.006, t=-3.860,
p<.001, partial-η2=.022).
Regression Trees. A regression tree was grown on the training dataset to predict
classification accuracy of the data-generating theta, but the algorithm only made splits
based on the diagnosis-test correlation, so the regression tree is not presented. In a
random forest model grown in a training dataset to predict classification accuracy of the
data-generating theta, the most important variables were the diagnosis-test correlation
and prevalence (see Figure 2.1). Using the random forest model to predict classification
rate in the testing dataset, the MSE was .007, and the variance explained was .385. Also,
the MSE in the prediction of sensitivity was .013, and the variance explained was .315.
Finally, MSE in the prediction of specificity was .011 and the variance explained was
.274.
Conclusion. Important predictors of classification rates, sensitivity, and
specificity were diagnosis-test correlation and prevalence. This model is the baseline for
the following analyses.
Classification accuracy with estimated theta. Table 2.7 and 2.8 show that
classification rates ranged from .58 to .80; Table 2.9 and 2.10 show that sensitivity
ranged from .58 to .82; and Table 2.11 and 2.12 show that specificity ranged for .59 to
.79. Classification accuracy seemed to increase as the diagnosis-test correlation
increased.
90
Linear Regression. In a regression predicting classification rate of the estimated
theta from number of items, item categories, diagnosis-test correlation, prevalence,
sample size, and local dependence, the variance explained by the predictors was R2=.337.
Classification rates increased as the diagnosis-test correlation increased (b=.146, s.e.
=.005, t=26.964, p<.001, partial-η2=.321) and as the prevalence decreased (b=-.032,
s.e.=.005, t=-5.887, p<.001, partial-η2=.021). In a regression predicting sensitivity from
the simulation factors, the variance explained by the predictors was R2=.280. Sensitivity
increased as the diagnosis-test correlation increased (b=.175, s.e. =.007, t=24.010,
p<.001, partial-η2=.272). In a regression predicting specificity from the simulation
factors, the variance explained by the predictors was R2=.236. Specificity increased as the
diagnosis-test correlation increased (b=.144, s.e. =.007, t=21.177, p<.001, partial-
η2=.220) and as the prevalence decreased (b=-.039, s.e.=.007, t=-5.793, p<.001, partial-
η2=.017).
Regression Trees. A regression tree was grown on the training dataset to predict
classification accuracy of the estimated theta, but the algorithm only made splits based on
the diagnosis-test correlation, so the regression tree is not presented. In a random forest
model grown in a training dataset to predict the classification accuracy of the estimated
IRT theta, the most important predictor was diagnosis-test correlation, followed by
prevalence (see Figure 2.2). Using the random forest model to predict classification rate
in the testing dataset, MSE was .007, and the variance explained was .334. Also, MSE in
the prediction of sensitivity was .014, and the variance explained was .281. Finally, the
MSE in the prediction of specificity was .012, and the variance explained was .233.
91
Conclusion. Important predictors of classification rates, sensitivity, and
specificity of the estimated theta were diagnosis-test correlation and prevalence.
Estimated theta also had comparable classification accuracy as data-generating theta.
Classification accuracy of the raw summed score. Table 2.13 and 2.14 show
that classification rates ranged from .57 to .79; Table 2.15 and 2.16 show that sensitivity
ranged from .59 to .83; and Table 2.17 and 2.18 show that specificity ranged from .57 to
.79. Classification accuracy seemed to increase with as diagnosis-test correlation
increased.
Linear Regression. In a regression predicting classification rate of the raw
summed score from number of items, item categories, diagnosis-test correlation,
prevalence, sample size, and local dependence, the variance explained by the predictors
was R2=.320. Classification rates increased as the diagnosis-test correlation increased
(b=.143, s.e. =.006, t=26.040, p<.001, partial-η2=.306) and as the prevalence decreased
(b=-0.018, s.e.=.006, t=-3.202, p=.001, partial-η2=.015). In a regression predicting
sensitivity from the simulation factors, the variance explained by the predictors was
R2=.281. Sensitivity increased as the diagnosis-test correlation increased (b=.182, s.e.
=.007, t=24.973, p<.001, parial-η2=.274). In a regression predicting specificity from the
simulation factors, the variance explained by the predictors was R2=.223. Specificity
increased as the diagnosis-test correlation increased (b=.142 s.e. =.007, t=20.423,
p<.001, partial-η2=.208) and as the prevalence decreased (b=-.024, s.e.= .007, t=-3.472,
p<.001, partial-η2=.013).
Regression Tree. A regression tree was grown on the training dataset to predict
classification accuracy of the raw summed scores, but the algorithm only made splits
92
based on the diagnosis-test correlation, so the regression tree is not presented. In a
random forest model grown in a training dataset to predict the classification accuracy of
the raw summed scores, the most important variables were diagnosis-test correlation and
prevalence (see Figure 2.3). Using the random forest model to predict classification rate
in the testing dataset, the MSE was .008, and the variance explained was .317. Also, MSE
in the prediction of sensitivity was .014, and the variance explained was .281. Finally, the
MSE in the prediction of specificity was .013, and the variance explained was .219.
Conclusion. Important predictors of classification rates, sensitivity, and
specificity of the raw summed score were diagnosis-test correlation and prevalence. The
raw summed score also had comparable classification accuracy as data-generating theta.
Comparing classification accuracy across psychometric models. The
diagnosis-test correlation and prevalence were the only predictors used in this analysis.
Results suggest that there were no practical differences in classification accuracy between
using data-generating theta and the estimated theta as a function of simulation factors
(classification rate R2 = .003; sensitivity R2 = .001; and specificity R2 = .002). Also, there
were no practical differences in classification accuracy between using data-generating
theta and the raw summed scores as a function of simulation factors (classification rate R2
= .004; sensitivity R2 = .001; and specificity R2 = .003). Finally, there were no practical
difference in classification accuracy between using the estimated theta and the raw
summed scores as a function of simulation factors (classification rate R2 = .001;
sensitivity R2 = .001; and specificity R2 = .001).
Brief Summary. Across psychometric models, the best predictors of
classification rates, sensitivity, and specificity were the diagnosis-test correlation and
93
prevalence. Specifically, classification accuracy increased as the diagnosis-test
correlation increased and prevalence decreased. Also, there was not a significant
difference in classification accuracy among using data-generating theta, estimated theta,
or raw summed scores. Therefore, models using the data-generating theta are used in
subsequent analyses.
3. Estimation of the Machine Learning Methods
Model building. This section discusses conditions where machine learning
models only assigned participants to the majority class. There were two ways to
determine class assignment in machine learning models. First, cases could be assigned to
the class they are most likely to belong (greater than 50% probability of belonging to a
class) to reduce prediction error. This method is referred to as the Bayes classifier. The
second type of class assignment uses ROC curves to determine a probability threshold for
class assignment to balance sensitivity and specificity (as in the psychometric models).
As previously mentioned, only conditions with at least 50% of models assigning cases to
the minority class were investigated and discussed.
CART with Bayes classifier. Table 3.1 and 3.2 show that at least 80% of the
models with 5% prevalence did not assign cases to the minority class. Also, at least 54%
of the models in conditions with N=250 did not assign participants to the minority class.
Assignment to the minority class increased as prevalence, number of items, and number
of categories increased. To estimate the full-factorial ANOVA, only conditions with 20%
prevalence and N=500 or N=1000 were analyzed.
Random Forest with Bayes classifier. Table 3.3 and 3.4 show that at least 65% of
the models in conditions with a 5% prevalence and 10 binary items did not assign cases
94
the minority class. Similar patterns were found in conditions with a 5% prevalence and
30 items, but only when r = .30. At least 50% of the models in conditions with 10%
prevalence and 10 binary items did not assign cases to the minority class when sample
size was 500 or 1,000 and the diagnosis-test correlation was .30 or .50. Otherwise, in
conditions with binary items, the number of models that assigned cases to the minority
class increased as sample size, diagnosis-test correlation, number of items, and
prevalence increased. For five-category items, the number of models that predicted the
minority class increased as sample size, diagnosis-test correlation, and prevalence
increased and as the number of items decreased. To estimate the full-factorial ANOVA,
only conditions with prevalence of 20% were analyzed.
Lasso Logistic Regression with Bayes classifier. Table 3.5 and 3.6 suggest that
most models did not assign cases to the minority class. Conditions where at least 50% of
models assigned cases to the minority class had polytomous items, a sample size of 500
or 1,000, prevalence of 20%, and a diagnosis-test correlation of .70. In conditions with
binary items, models assigned cases to the minority class when there was 20%
prevalence, 30 items, diagnosis-test correlation of .70, and a sample size of 1,000. To
estimate the full-factorial ANOVA, the eight conditions analyzed had at least 500 people,
20% prevalence, five-category items, and a diagnosis-test correlation of .7.
Relaxed Lasso Logistic Regression with Bayes classifier. Table 3.7 and 3.8 show
that at least 50% of conditions with a diagnosis-test correlation of .30 did not assign cases
to the minority class. Also, most conditions that did not assign cases to the minority class
had binary items or 10 items. To estimate a full-factorial ANOVA, conditions with five-
95
category items and a diagnosis-test correlation of .70 were analyzed (even though in four
out of 36 conditions at least 50% of the models did not assign cases to the minority class).
Logistic Regression with Bayes classifier. Table 3.9 and 3.10 show that at least
50% of models did not assign cases to the minority class in conditions with 10 binary
items. For conditions with 10 five-category items, at least 50% of the models did not
assign cases to the minority class in conditions with a sample size of 500 or 1,000 and
diagnosis-test correlation of .30 or .50. For conditions with 30 items, most conditions had
at least 50% of models assign cases to the minority class except when there were binary
items, a sample size of 1,000, prevalence of 5%, and diagnosis-test correlation of .30. To
estimate the full-factorial ANOVA, only conditions with 30 items were analyzed (even
though in two out of 108 conditions at least 50% of the models did not assign cases to the
minority class).
Random Forest with ROC classifier. There were no problematic conditions for
models using random forest with a ROC classifier, so all conditions were analyzed.
Lasso Logistic Regression with ROC classifier. Table 3.11 and 3.12 show that at
least 50% of models did not assign cases to the minority class in conditions with a
diagnosis-test correlation of .30. All of the conditions with a diagnosis-test correlation of
.70 had greater than a 50% probability of assigning cases to the minority class. To
estimate the full-factorial ANOVA, only conditions with a diagnosis-test correlation of
.70 were analyzed.
Relaxed Lasso Logistic Regression with ROC classifier. Exact same patterns as
from the Lasso Logistic Regression using a ROC classifier were observed, so no Tables
are presented.
96
Logistic Regression with ROC classifier. There were no problematic conditions
for models using logistic regression with a ROC classifier, so all conditions were
analyzed.
Brief Summary. There are many conditions where machine learning models do
not assign cases to the minority class. Most conditions analyzed include conditions with
high diagnosis-test correlation, high prevalence, high sample size, and 30 five-category
items. Assignment to the minority class increases when models use a ROC classifier
instead of a Bayes classifier.
4. Classification Accuracy of the Machine Learning Models using the Bayes
classifier
CART classification accuracy with Bayes classifier. Only conditions with 20%
prevalence and N=500 or N=1000 were analyzed. Table 4.1 suggests that classification
rate for CART ranges between .70 and .80, sensitivity ranges between .07 and .36, and
specificity ranges between .83 and .95. Classification accuracy seems to increase as the
diagnosis-test correlation increases, except for specificity in conditions with 10 items,
where specificity decreased as the diagnosis-test correlation increased.
Linear Regression. In the regression predicting classification rate from the
simulation factors, the variance explained was R2=.858. There was a significant three-
way interaction between number of items, number of item categories, and diagnosis-test
correlation (b=-.032, s.e.= .002, t=13.899, p<.001, partial-η2= .049). Across conditions,
classification rates decreased as the number of items and number of categories increased,
and as the diagnosis-test correlation decreased. As the diagnosis-test correlation
increased, the difference in classification rate due to number of items and number of item
97
categories decreased. In the regression predicting sensitivity from the simulation factors,
the variance explained was R2=.608. There was a significant three-way interaction
between number of items, number of item categories, and diagnosis-test correlation (b=-
.076, s.e. = .001, t=7.506, p<.001, partial-η2= .025). Across conditions, sensitivity
increased as the number of items, number of item categories, and the diagnosis-test
correlation increased. As the diagnosis-test correlation increased, the difference in
sensitivity across number of items and number of item categories decreased. In the
regression predicting specificity from the simulation factors, the variance explained was
R2=.608. There was a significant three-way interaction between number of items, number
of item categories, and diagnosis-test correlation (b=-.059, s.e. = .004, t=-12.761,
p<.001, partial-η2= .025). On average, specificity increased as the diagnosis-test
correlation increased, and as the number of items and number of item categories
decreased. As the diagnosis-test correlation increased, the difference in specificity across
number of items and number of item categories decreased.
Regression Trees. The regression trees grown in a training dataset to predict
classification rate, sensitivity, and specificity from simulation factors are presented in
Figures 4.1, 4.2, and 4.3. For classification rate, the first split was on the diagnosis-test
correlation. If the case was in a condition with a diagnosis-test correlation of .70, the
predicted classification rate was .80. If diagnosis-test correlation was lower than .70,
predicted classification rate dependent on number of items and number of item
categories. Conditions with binary items had the higher predicted classification rate than
conditions with five-category items. For sensitivity, the first split was also diagnosis-test
correlation. If the case was in a condition with a diagnosis-test correlation of .70, the
98
predicted sensitivity was .35. If the diagnosis-test correlation was lower than .70,
predicted sensitivity depended on number of items and number of item categories.
Conditions with more items and more item categories had higher predicted sensitivity
than the complement. For specificity, the first split was also diagnosis-test correlation. If
the case was in a condition with diagnosis-test correlation of .70, the predicted specificity
was .91. If the diagnosis-test correlation was lower than .70, predicted specificity
depended on the number of items, and then on the number of categories. Models in
conditions with 10 binary items had higher specificity than conditions with 30 five-
category items. In a random forest model grown in a training dataset to predict the
classification accuracy of classification and regression trees, the most important predictor
was diagnosis-test correlation, followed by number of items and number of item
categories (see Figure 4.4). Using the random forest model to predict classification rate in
the testing dataset, the MSE was .001, and the variance explained was .820. Also, MSE in
the prediction of sensitivity was .007, and the variance explained was .580. Finally, the
MSE in the prediction of specificity was .001 and the variance explained was .652.
Conclusion. Important predictors for classification rates, sensitivity, and
specificity were the diagnosis-test correlation, number of items, and number of item
categories. However, these results only apply when prevalence is at least 20% and sample
sizes are 500 or 1,000. Compared to the results from the data-generating model,
specificity is inflated and sensitivity is too low.
Random Forest classification accuracy with Bayes classifier. Only conditions
with prevalence of 20% were analyzed. Table 4.2 suggest that classification rate for
random forest ranged around .79 and .83; Table 4.3 show that sensitivity ranges between
99
.01 and .35; and Table 4.4 show that specificity ranges between .93 and .99.
Classification accuracy seems to increase as the diagnosis-test correlation and number of
items increase.
Linear Regression. In the regression predicting classification rate from the
simulation factors, the variance explained was R2=.759. There was a significant three-
way interaction between number of items, number of item categories, and diagnosis-test
correlation (b=-.013, s.e. = .001, t=13.135, p<.001, partial-η2= .017). On average,
classification rates increased as the number of items, number of item categories, and the
diagnosis-test correlation increased. As the number of item categories increased,
classification rate increased for conditions with 30 items and for conditions with at least a
diagnosis-test correlation of .5. On the other hand, classification rate decreased as the
number of item categories increased for conditions with 10 items and a diagnosis-test
correlation of .3. In the regression predicting sensitivity from the simulation factors, the
variance explained was R2=.823. There were two significant two-way interactions: the
interaction between number of items and number of item categories (b=.025, s.e. = .005,
t=4.914, p<.001, partial-η2= .019), and the interaction of number of items and diagnosis-
test correlation (b=-.049, s.e. = .005, t=-9.563, p<.001, partial-η2= .049). On average,
sensitivity increased as the number of items, number of categories, and the diagnosis-test
correlation increased. As the number of items increased, sensitivity increased for
conditions with two item categories, but slightly decreased for conditions with five-
category items. Also, as the number of items increased, sensitivity increased at a faster
rate in conditions with higher diagnosis-test correlation than for conditions with a lower
diagnosis-test correlation. In the regression predicting specificity from the simulation
100
factors, the variance explained was R2=.509. There was a significant three-way
interaction between number of items, number of item categories, and diagnosis-test
correlation (b=-.022, s.e. = .002, t=-9.742, p<.001, partial-η2= .013). On average,
specificity was not influenced by number of categories; specificity increased as the
number of items increased; and specificity decreased as the diagnosis-test correlation
increased. As the number of item categories increased, specificity increased for
conditions with 30 items, but decreased for conditions with 10 items. On the other hand,
as the number of item categories increased, specificity increased for conditions with a
diagnosis-test correlation of .7, but decreased for conditions with either a diagnosis-test
correlation of .3 or .5.
Regression Trees. The regression tree grown in a training dataset to predict
classification rate is presented in Figure 4.5. For classification rate, the first split was
diagnosis-test correlation. If the case was in a condition with a diagnosis-test correlation
of .70, the predicted classification rate depended on number of items and number of item
categories, where the highest predicted classification rate (.83) was in conditions with 30
five-category items. If the diagnosis-test correlation was lower than .70, the predicted
classification rate also depended on number of items and number of item categories,
where the lowest predicted classification (.78) was in conditions with 10 five-category
items. For sensitivity, the tree only made splits based on diagnosis-test correlation, so the
tree is not presented. For sensitivity, the first split was based on diagnosis-test correlation
(see Figure 4.6). If the diagnosis-test correlation had a value of .30, then the predicted
specificity depended on sample size, number of item categories, and number of items, but
it was higher than in conditions with diagnosis-test correlation of .50 or .70. In a random
101
forest model grown in a training dataset to predict the classification rates, the most
important variable was diagnosis-test correlation, followed by number of items (see
Figure 4.7, left pane). Using the random forest model to predict classification rate in the
testing dataset, the MSE was .001, and the variance explained was .742. In the model
predicting sensitivity, the most important variable was diagnosis-test correlation (see
Figure 4.7, center pane). The MSE in the prediction of sensitivity was .007, and the
variance explained was .785. Finally, in the model predicting specificity, the most
important variable was diagnosis-test correlation, followed by number of items (see
Figure 4.7, right pane). The MSE in the prediction of specificity was .001, and the
variance explained was .480.
Conclusion. Important predictors for classification rates, sensitivity, and
specificity were the diagnosis-test correlation, number of items, and number of item
categories. However, these results only apply when prevalence is 20%. Compared to
results from the data-generating model, specificity was inflated and sensitivity was too
low.
Lasso Logistic Regression classification accuracy with Bayes classifier. The
eight conditions analyzed had at least 500 people, 20% prevalence, five-category items,
and a diagnosis-test correlation of .7. Table 4.4A shows that classification rate ranged
between .81 and .82; sensitivity ranged between .13 and .18; and specificity ranged
between .98 and .99. Classification accuracy seems to increase with sample size and
number of items.
Linear Regression. In a regression predicting classification rate from sample size,
number of items, and local dependence, the variance explained was R2=.096. There were
102
significant main effects of sample size (b=.002, s.e. = .001, t=3.456, p=.001, partial-η2=
.045) and number of items (b=.002, s.e. = .001, t=3.438, p=.001, partial-η2= .052), so
classification rate increased as sample size and number of items increased. In a regression
predicting sensitivity from sample size and local dependence, the variance explained was
R2=.061. There was a significant main effect of sample size (b=.012, s.e. = .005, t=2.283,
p=.022, partial-η2= .033) and a non-significant effect of number of items, but with
partial-η2 greater than .01 (b=.009, s.e. = .005, t=1.732, p=.083, partial-η2= .026). In this
case, sensitivity increased as sample size and number of items increased. In a regression
predicting specificity from sample size and local dependence, the variance explained was
R2=.020. There was a nonsignificant main effect of sample size, but the partial-η2 was
greater than .01 (b=-.001, s.e. = .001, t=-.987, p=.324, partial-η2= .012), so specificity
decreased as sample size increased.
Regression Trees. The regression tree grown in the training dataset to predict
classification rate had two splits, but they all predicted a classification rate of .82. For
sensitivity, the first split was sample size (see Figure 4.8). If sample size was 500, the
predicted sensitivity was .14. If sample size was 1,000, the predicted sensitivity depended
on the number of items. In conditions with 10 items, the predicted sensitivity was .15,
and in conditions with 30 items, the predicted sensitivity was .18. For specificity, the
regression tree only had a split on sample size, so the regression tree is not presented. In a
random forest model grown in a training dataset to predict the classification rates, the
most important variable was number of items, followed by sample size (see Figure 4.9,
left panel). Using the random forest model to predict classification rate in the testing
dataset, the MSE was .001 and the variance explained was .086. In the model predicting
103
sensitivity, the most important variables were sample size and number of items (see
Figure 4.9, center panel). The MSE in the prediction of sensitivity was .005, and the
variance explained was .057. Finally, in the model predicting specificity, the most
important variable was sample size, followed by number of items (see Figure 4.9, right
panel). The MSE in the prediction of specificity was .001, and the variance explained was
.018.
Conclusion. Important predictors for classification rates, sensitivity, and
specificity were sample size and number of items. However, these results only apply in
conditions with five-category items, prevalence of 20%, diagnosis-test correlation of .70,
and sample size of 500 or 1,000. Compared to the data-generating model, specificity was
inflated and sensitivity was too low.
Relaxed Lasso Logistic Regression classification accuracy with Bayes
classifier. Only conditions with five-category items and a diagnosis-test correlation of
.70 were analyzed. Table 4.5 shows that classification rates ranged from .81 to .95; Table
4.6 shows that sensitivity ranged from .08 to .39; and Table 4.7 shows that specificity
ranged between .90 and .99. Classification rate and specificity seemed to decrease as
prevalence increased, and sensitivity increased as prevalence and number of items
increased.
Linear Regression. In a regression predicting classification rate from sample size,
number of items, prevalence, and local dependence, the variance explained was R2=.970.
There was a significant interaction between prevalence and number of items (b=.002, s.e.
= .001, t=5.317, p<.001, partial-η2= .020), and a significant main effect of sample size
(b=.006, s.e. = .001, t=5.106, p<.001, partial-η2= .120). On average, classification rates
104
increased as sample size and number of items increased and as prevalence decreased. The
differences in classification rates across prevalence decreased as the number of items
increased. In a regression predicting sensitivity from sample size, number of items,
prevalence, and local dependence, the variance explained was R2=.700. There was a
significant interaction between sample size and prevalence (b=.043, s.e. = .004,
t=11.833, p<.001, partial-η2= .038), and a significant main effect of number of items
(b=.026, s.e. = .005, t=4.925, p<.001, partial-η2= .104). On average, sensitivity increased
as prevalence and number of items increased, and sensitivity decreased as sample size
increased. As prevalence increased, the differences in sensitivity across sample size
decreased. In a regression predicting specificity from sample size, number of items,
prevalence, and local dependence, the variance explained was R2=.719. There were
significant main effects of sample size (b=.012, s.e. = .001, t=10.610, p<.001, partial-η2=
.110), and prevalence (b=-.029, s.e. = .001, t=-46.416, p<.001, partial-η2= .702), and a
nonsignificant effect of number of items, but with a partial-η2 > .01, (b=-.002, s.e. = .001,
t=-1.779, p=.075, partial-η2= .036). On average, specificity increased as sample size
increased, and specificity decreased as number of items and prevalence increased.
Regression Trees. The regression tree grown on the training dataset to predict
classification rate only made splits based on prevalence, so the regression tree is not
presented. For sensitivity, the first split was on prevalence (see Figure 4.10). If the case
was in a condition with a prevalence of 20%, predicted sensitivity depended on the
number of items, where the condition with more items had the highest predicted
sensitivity (.39). If the case was in a condition with a prevalence of 5%, predicted
sensitivity depended on sample size, where lower sample size had higher predicted
105
sensitivity. If the case was in a condition with a prevalence of 10%, predicted sensitivity
depended on number of items, where higher number of items led to higher predicted
sensitivity. For specificity, the first split was on prevalence (see Figure 4.11), where the
condition with 5% prevalence had the highest specificity (.99). For conditions with 20%
prevalence, specificity depended on sample size, where conditions with higher sample
size had higher predicted specificity. In a random forest model grown in a training dataset
to predict the classification accuracy of relaxed lasso logistic regression, the most
important variable was prevalence (see Figure 4.12). Using the random forest model to
predict classification rate in the testing dataset, the MSE was .001, and the variance
explained was .952. Also, MSE in the prediction of sensitivity was .005, and the variance
explained was .705. Finally, the MSE in the prediction of specificity was .001 and the
variance explained was .763.
Conclusion. The most important predictor for classification rates, sensitivity, and
specificity was prevalence. However, these results only apply in conditions where the
diagnosis-test correlation is .70 and there are five-category items. Compared to results
from the data-generating model, specificity was inflated and sensitivity was too low.
Logistic Regression classification accuracy with Bayes classifier. Only
conditions with five-category items were analyzed. Table 4.8 shows that classification
rate ranged between .86 and .95; Table 4.9 shows that sensitivity ranged between .01 and
.41; and Table 4.10 shows that specificity ranged between .90 and .99. Classification rate
and specificity increased as sample size increased, and decreased as prevalence increased.
Sensitivity increased as diagnosis-test correlation, prevalence, item categories increased
and decreased as sample size increased.
106
Linear Regression. In a regression predicting classification rate from sample size,
number of items, diagnosis-test correlation, prevalence, and local dependence, the
variance explained was R2=.963. There were significant interactions between prevalence
and diagnosis-test correlation (b=.046, s.e. = -.001, t=44.603, p<.001, partial-η2= .315),
and between sample size and prevalence (b=-.003, s.e. = -.001, t=-2.571, p=.010, partial-
η2= .078). On average, classification rate increased as sample size and the diagnosis-test
correlation increased, and classification rate decreased as prevalence increased. As the
diagnosis-test correlation increased, the differences in classification rates across
prevalence decreased. Also, as sample size increased, the differences in classification
rates across prevalence increased. In a regression predicting sensitivity from sample size,
number of items, diagnosis-test correlation, prevalence, and local dependence, the
variance explained was R2=.854. There was a significant three-way interaction between
prevalence, sample size, and diagnosis-test correlation (b=.166, s.e. = -.007, t=24.084,
p<.001, partial-η2= .047). On average, sensitivity increased as prevalence and diagnosis-
test correlation increased, and sensitivity decreased as sample size increased. As the
diagnosis-test correlation increased, the differences in sensitivity across prevalence
increased. Also, as prevalence increased, the differences in sensitivity across sample sizes
decreased. In a regression predicting specificity from sample size, number of items,
diagnosis-test correlation, prevalence, and local dependence, the variance explained was
R2=.759. There was a significant three-way interaction between prevalence, sample size,
and diagnosis-test correlation (b=.166, s.e. = -.007, t=24.084, p<.001, partial-η2= .035).
On average, specificity increased as sample size increased, and specificity decreased as
prevalence and diagnosis-test correlation decreased. As prevalence increased, the
107
differences in specificity across sample size decreased. Also, as prevalence increased, the
differences in specificity across diagnosis-test correlation increased.
Regression Trees. The regression tree grown in a training dataset to predict
classification rate is presented in Figure 4.13. For classification rate, the first split was
prevalence. If the case was in a condition with prevalence of 20%, classification rate
depended in the diagnosis-test correlation. Conditions with a diagnosis-test correlation of
.70 had a predicted classification rate of .80. However, in conditions with lower
diagnosis-test correlation the predicted classification rate depended on sample size, where
conditions with higher sample sizes had higher predicted classification rates. On the other
hand, for conditions with prevalence of 5% or 10%, the predicted classification rate
depended on sample size, where higher sample sizes had higher predicted classification
rate. For sensitivity, the first split was diagnosis-test correlation (see Figure 4.14). For
conditions with a diagnosis-test correlation of .70, the predicted sensitivity depended on
prevalence, where conditions with higher prevalence had the highest predicted sensitivity
(.39). On the other hand, conditions with diagnosis test-correlation of .30 or .50 and high
prevalence had higher predicted sensitivity than the complement. For specificity, the first
split was sample size, and then prevalence (see Figure 4.15). For conditions with sample
size of 500 or 1,00 and prevalence of 5% or 10%, specificity was at least .97. On the
other hand, conditions with a sample size of 250 and prevalence of 20% had the lowest
specificity with .90. In a random forest model grown in a training dataset to predict the
classification rates, the most important variable was prevalence (see Figure 4.16, left
panel). Using a random forest model to predict classification rate in the testing dataset,
the MSE was .001, and the variance explained was .932. In the model predicting
108
sensitivity, the most important variable was diagnosis-test correlation, followed by
prevalence and sample size (see Figure 4.16, center panel). The MSE in the prediction of
sensitivity was .006, and the variance explained was .790. Finally, in the model
predicting specificity, the most important variable was diagnosis-test correlation (see
Figure 4.16, right panel). The MSE in the prediction of specificity was .001, and the
variance explained was .729.
Conclusion. The most important predictors for classification rates, sensitivity, and
specificity were prevalence, diagnosis-test correlation, and sample size. However, these
results only apply in conditions with five-category items. Compared to results from the
data-generating model, specificity was inflated and sensitivity was too low.
Brief Summary. Previous results suggest that machine learning methods that
minimize prediction error (using a Bayes classifier) have inflated specificity and very low
sensitivity. Therefore, it is not recommended to use these models when researchers are
interested in the prediction of a diagnosis that rarely occurs. Given their performance,
machine learning methods with the Bayes classifier are no longer discussed.
5. Classification Accuracy of the Machine Learning Models using ROC classifiers
In this section, the random forest model, logistic regression, lasso logistic
regression, and relaxed lasso logistic regression are analyzed. CART with a ROC
classifier was not carried out because CART yields the same probability to each case in
the node, limiting the number of possible probability thresholds during the ROC analysis.
Effect of ROC index. Similar to the psychometric models, the ROC probability
thresholds for class assignment were determined by the Youden index, closest-to-(0,1)
criterion, and the concordance probability. For random forest, there were significant
109
differences in classification rate (R2=.149-.150), sensitivity (R2 = .122-.124), and
specificity (R2 =.134-.135) as a function of simulation factors, across the three ROC
indices. For logistic regression, there were small differences in classification rate
(R2=.017-.022) and specificity (R2 =.014-.016), and small to medium differences in
sensitivity (R2 = .016-.137) as a function of simulation factors, across the three ROC
indices. For lasso logistic regression, there were small differences in classification rate
(R2=.008-.017), sensitivity (R2 = .003-.006), and specificity (R2 =.004-.005) as a function
of simulation factors, across the three ROC indices. Finally, for the relaxed lasso logistic
regression, there were small differences in classification rate (R2=.006-.014), sensitivity
(R2 = .003-.005), and specificity (R2 =.003-.005) as a function of simulation factors,
across the three ROC indices. In these analyses, the Youden index had the highest
sensitivity across the vast majority of conditions. Results from section 3 suggest that
there is low sensitivity in the machine learning models with Bayes classifier, so the
results for section 4 are presented are based on the Youden index to increase sensitivity.
Random Forest classification accuracy with ROC classifier. Table 4.11 and
4.12 show that classification rate ranged between .49 and .76; Table 4.13 and 4.14 show
that sensitivity ranged between .45 and .82; and Table 4.15 and 4.16 show that specificity
ranged between .46 and .84. Classification accuracy seemed to increase as diagnosis-test
correlation, number of item categories, and number of items increased.
Linear Regression. In a regression predicting classification rate from sample size,
number of items, number of item categories, diagnosis-test correlation, prevalence, and
local dependence, the variance explained was R2=.517. There were several significant
three-way interactions: the interaction between number of items, number of item
110
categories, and prevalence (b=-.173, s.e. = -.010, t=-16.258, p<.001, partial-η2= .047),
the interaction between number of items, number of item categories, and the diagnosis-
test correlation (b=-.108, s.e. = -.010, t=-10.121, p<.001, partial-η2= .011), and the
interaction between sample size, number of items, and number of item categories
(b=.221, s.e. = .010, t=20.640, p<.001, partial-η2= .014). On average, classification rates
increased as the diagnosis-test correlation and the sample size increased, classification
rate decreased as prevalence decreased, and classification rates did not seem influenced
by number of items or number of item categories. However, as the number of items
increased, differences in classification rates across sample size and prevalence decreased,
and the differences in classification rates across diagnosis-test correlations increased.
Also, as the number of item categories increased, the classification rate for conditions
with 10 items decreased, and the classification rate for conditions with 30 items
increased. In a regression predicting sensitivity from sample size, number of items,
diagnosis-test correlation, prevalence, and local dependence, the variance explained was
R2=.466. There was a significant interaction between number of items, number of item
categories, and prevalence (b=.229, s.e. = .015, t=15.140, p<.001, partial-η2= .054), a
significant interaction between sample size, number of items, and number of item
categories (b=-.262, s.e. = .015, t=-17.280, p<.001, partial-η2= .019), and a main effect
of diagnosis-test correlation (b=.185, s.e. = .008, t=24.421, p<.001, partial-η2= .224). On
average, sensitivity increased as the diagnosis-test correlation, sample size, and number
of items increased, and sensitivity decreased as prevalence increased. However, as the
number of items and item categories increased, differences in sensitivity across sample
size and prevalence decreased. In a regression predicting specificity from sample size,
111
number of items, diagnosis-test correlation, prevalence, and local dependence, the
variance explained was R2=.443. There was a significant interaction between number of
items, number of item categories, and prevalence (b=-.203, s.e. = .013, t=-15.170,
p<.001, partial-η2= .035), a significant interaction between sample size, number of items,
and number of item categories (b=.247, s.e. = .013, t=18.413, p<.001, partial-η2= .013),
and a main effect of diagnosis-test correlation (b=.099, s.e. = .007, t=14.689, p<.001,
partial-η2= .339). On average, specificity increased as the diagnosis-test correlation, and
sample size increased, and specificity decreased as prevalence increased. However, as the
number of items and item categories increased, differences in specificity across sample
size and prevalence decreased.
Regression Tree. The regression tree grown in a training dataset to predict
classification rate only made splits based on the diagnosis-test correlation, so the tree is
not presented. For sensitivity, the first split was diagnosis-test correlation. For conditions
with a diagnosis-test correlation of .70, predicted sensitivity was higher for conditions
with higher number of items, higher item categories, and higher prevalence than the
complement. For conditions with a diagnosis-test correlation of .50 or .30, predicted
sensitivity depended on number of items, where conditions with five category-items had
a higher sensitivity than conditions with binary items. For specificity, the tree only made
splits based on the diagnosis-test correlation, so the tree is not presented. Using a random
forest model to predict classification rate in the testing dataset, the most important
variable was diagnosis-test correlation (see Figure 4.18, left panel). In the model
predicting classification rate, the MSE was .007, and the variance explained was .500. In
the model predicting sensitivity, the most important variable was also diagnostic-test
112
correlation, followed closely by number of items and number of item categories (see
Figure 4.18, center panel). The MSE in the prediction of sensitivity was .015, and the
variance explained was .449. In the model predicting specificity, the most important
variable was also diagnostic-test correlation (see Figure 4.19, right panel). The MSE in
the prediction of specificity was .011, and the variance explained was .426.
Conclusion. Important predictors for classification rate and specificity was
diagnosis-test correlation, while the important predictors for sensitivity were the
diagnosis-test correlation, number of items, number of item categories, and prevalence.
All of the classification accuracy indices were close to those of the data-generating
model.
Logistic Regression classification accuracy with ROC classifier. Table 4.17
and 4.18 show that classification rate ranged between .52 and .81; Table 4.19 and 4.20
show that sensitivity ranged between .54 and .82; and Table 4.21 and 4.22 show that
specificity ranged between .51 and .75. Classification accuracy seemed to increase as
diagnosis-test correlation increased, and decreased as prevalence increased.
Linear Regression. In a regression predicting classification rate from simulation
factors, the variance explained was R2=.517. There was a significant main effect of
diagnosis-test correlation (b=.165, s.e. = .004, t=40.118, p<.001, partial-η2= .498), and
sample size (b=.034, s.e. = .004, t=8.140, p<.001, partial-η2= .011). On average,
classification rates increased as the diagnosis-test correlation and the sample size
increased. In a regression predicting sensitivity from simulation factors, the variance
explained was R2=.441. There was a significant three-way interaction between sample
size, prevalence, and diagnosis-test correlation (b=.165, s.e. = .004, t=40.118, p<.001,
113
partial-η2= .010), and a three-way interaction between sample size, prevalence, and
number of items (b=.034, s.e. = .004, t=8.140, p<.001, partial-η2= .014). On average,
sensitivity increased as the diagnosis-test correlation and the sample size increased, and
decreased as the number of items and prevalence decreased. As the number of items
increased, the difference in sensitivity across sample size and prevalence decreased. Also,
as the diagnosis-test correlation increased, the difference in sensitivity across sample size
and prevalence increased. In a regression predicting specificity from simulation factors,
the variance explained was R2=.417. There was a significant main effect of diagnosis-test
correlation (b=.165, s.e. = .005, t=32.831, p<.001, partial-η2= .392), and a non-
significant effect of prevalence, but with a partial-η2 > .01 (b=.008, s.e. = .005, t=1.511,
p=.131, partial-η2= .010). On average, specificity increased as the diagnosis-test
correlation increased, and specificity decreased as prevalence increased.
Regression Tree. The regression tree grown in a training dataset to predict
classification rate only made splits based on the diagnosis-test correlation, so the tree is
not presented. For sensitivity, the first split was diagnosis-test correlation. For conditions
with a diagnosis-test correlation of .30, predicted sensitivity was .60. For conditions with
diagnosis-test correlation of .50, predicted sensitivity was .69. For conditions with
diagnosis-test correlation of .70, predicted sensitivity depended on sample size, number
of items, and prevalence, where conditions with high sample sizes had higher sensitivity,
regardless of number of items or prevalence. For specificity, the tree only made splits
based on the diagnosis-test correlation, so the tree is not presented. Using a random forest
model to predict classification accuracy in the testing dataset, the most important variable
was diagnosis-test correlation (see Figure 4.20). In the model predicting classification
114
rate, the MSE was .004 and the variance explained was .509. The MSE in the prediction
of sensitivity was .008 and the variance explained was .436. The MSE in the prediction of
specificity was .006 and the variance explained was .409.
Conclusion. The most important predictor of classification accuracy was
diagnosis-test correlation. Also, the classification accuracy indices were close to those of
the data-generating model.
Lasso Logistic Regression classification accuracy with ROC classifier. Only
conditions with a diagnosis-test correlation of .7 were analyzed. Table 4.23 shows that
classification rate ranged between .69 and .76; Table 4.24 shows that sensitivity ranged
between .73 and .82; and Table 4.25 show that specificity ranged between .68 and .75.
Classification accuracy seemed to increase as sample size and number of items increased,
and decrease as prevalence increases.
Linear Regression. In a regression predicting classification rate from sample size,
number of item categories, number of items, prevalence, and local dependence, the
variance explained was R2=.100. There were significant main effects of number of item
categories (b=.011, s.e. = .003, t=3.551, p<.001, partial-η2= .016), sample size (b=.011,
s.e. = .003, t=3.582, p<.001, partial-η2= .037), number of items (b=.009, s.e. = .003,
t=2.868, p=.004, partial-η2= .028), and prevalence (b=-.016, s.e. = .003, t=-5.408,
p<.001, partial-η2= .021). On average, classification rates increased as sample size,
number of items, and number of item categories increased, and classification rates
decreased as prevalence increased. In a regression predicting sensitivity from sample
size, number of item categories, number of items, prevalence, and local dependence, the
variance explained was R2=.143. There were significant main effects of number of item
115
categories (b=.015, s.e. = .004, t=3.780, p<.001, partial-η2= .012), sample size (b=.022,
s.e. = .004, t=5.752, p<.001, partial-η2= .030), number of items (b=.013, s.e. = .004,
t=3.339, p<.001, partial-η2= .013), and prevalence (b=-.037, s.e. = .004, t=-9.658,
p<.001, partial-η2= .090). On average, sensitivity increased as sample size, number of
items, and number of item categories increased, and sensitivity decreased as prevalence
increased. In a regression predicting specificity from sample size, number of item
categories, number of items, prevalence, and local dependence, the variance explained
was R2=.083. There were significant main effects of number of item categories (b=.015,
s.e. = .004, t=3.780, p<.001, partial-η2= .011), sample size (b=.022, s.e. = .004, t=5.752,
p<.001, partial-η2= .024), number of items (b=.013, s.e. = .004, t=3.339, p<.001, partial-
η2= .020), and prevalence (b=-.037, s.e. = .004, t=-9.658, p<.001, partial-η2= .028). On
average, specificity increased as sample size, number of items, and number of item
categories increased, and sensitivity decreased as prevalence increased.
Regression Trees. The regression tree grown on the training dataset to predict
classification rate is presented in Figure 4.21. For classification rate, the first split was on
sample size, followed by a split on number of items. Conditions with higher sample size
and higher number of items had a higher classification rate. For sensitivity, the tree only
made splits based on prevalence, so the tree is not presented. For specificity, the first split
was prevalence (see Figure 4.22). If prevalence was 5%, the predicted specificity was .73.
On the other hand, if prevalence was 10% or 20%, prevalence depended on sample size
and number of items, where high sample size and high number of items had high
predicted specificity. In a random forest model grown in a training dataset to predict the
classification rate, the most important predictor was sample size, followed closely by
116
number of items (Figure 4.23, left panel). Using the random forest model to predict
classification rate in a testing dataset, the MSE was .002, and the variance explained was
.101. For sensitivity, the most important variable was prevalence (Figure 4.23, center
panel). The MSE in the prediction of sensitivity was .003, and the variance explained was
.132. For specificity, the most important variables were prevalence and sample size,
followed by number of items and number of item categories (Figure 4.23, right panel.
The MSE in the prediction of specificity was .003 and the variance explained was .081.
Conclusion. Important predictors of classification rates, sensitivity, and
specificity were prevalence, sample size, and number of items. However, these results
can only be expanded to conditions with a diagnosis-test correlation of .70. Also, the
classification accuracy indices were close to those of the data-generating model.
Relaxed Lasso Logistic Regression classification accuracy with ROC
classifier. Only conditions with a diagnosis-test correlation of .7 were analyzed. Table
4.26 shows that classification rate ranged between .70 and .75; Table 4.27 showed that
sensitivity ranged between .73 and .82; and Table 4.28 showed that specificity ranged
between .69 and .76. Classification accuracy seemed to increase as sample size, number
of items, and number of item categories increased, and classification accuracy decreased
as prevalence increased.
Linear Regression. In a regression predicting classification rate from sample size,
number of item categories, number of items, prevalence, and local dependence, the
variance explained was R2=.116. There were significant main effects of number of item
categories (b=.014, s.e. = .003, t=4.610, p<.001, partial-η2= .024), sample size (b=.012,
s.e. = .003, t=4.004, p<.001, partial-η2= .031), number of items (b=.012, s.e. = .003,
117
t=3.837, p<.001, partial-η2= .032), and prevalence (b=-.015, s.e. = .003, t=-5.063,
p<.001, partial-η2= .031). On average, classification rates increased as sample size,
number of items, and number of item categories increased, and classification rates
decreased as prevalence increased. In a regression predicting sensitivity from sample
size, number of item categories, number of items, prevalence, and local dependence, the
variance explained was R2=.144. There were significant main effects of number of item
categories (b=.020, s.e. = .004, t=4.917, p<.001, partial-η2= .015), sample size (b=.027,
s.e. = .003, t=7.260, p<.001, partial-η2= .038), number of items (b=.010, s.e. = .004,
t=2.408, p=.016, partial-η2= .012), and prevalence (b=-.035, s.e. = .004, t=-9.306,
p<.001, partial-η2= .082). On average, sensitivity increased as sample size, number of
items, and number of item categories increased, and sensitivity decreased as prevalence
increased. In a regression predicting specificity from sample size, number of item
categories, number of items, prevalence, and local dependence, the variance explained
was R2=.096. There were significant main effects of number of item categories (b=.014,
s.e. = .004, t=3.691, p<.001, partial-η2= .016), sample size (b=.010, s.e. = .004, t=3.031,
p=.002, partial-η2= .019), number of items (b=.012, s.e. = .004, t=3.158, p=.002, partial-
η2= .022), and prevalence (b=-.019, s.e. = .004, t=-5.346, p<.001, partial-η2= .038). On
average, specificity increased as sample size, number of items, and number of item
categories increased, and specificity decreased as prevalence increased.
Regression Trees. The regression tree grown on the training dataset to predict
classification rate is presented in Figure 4.24. For classification rate, the first split was
number of items. If the number of items was 30, then predicted classification rate
depended on prevalence and sample size, where conditions with 5% prevalence had the
118
highest predicted classification rate (.75). On the other hand, if the number of items was
10, then predicted classification rate depended on number of item categories, where the
predicted classification rate in conditions with five-category items had a higher predicted
classification rate than in conditions with two-category items. For sensitivity, the first
split was prevalence. If prevalence was 20%, the predicted sensitivity was .76. If
prevalence was 5% or 10%, sensitivity depended on sample size, where conditions with
5% prevalence and a sample size of 500 or 1,000 had the highest sensitivity (.81). For
specificity, the first split was prevalence. If prevalence was 5%, then the predicted
specificity was .74. On the other hand, if prevalence was 10% or 20%, then specificity
depended on number of items, where the predicted specificity was higher for conditions
with 30 items than with 10 items. In a random forest model grown in a training dataset to
predict the classification rate, the most important variables were number of items, sample
size, prevalence, and number of item categories (see Figure 4.27, left pane). Using the
random forest model to predict classification rate in the testing dataset, the MSE was
.002, and the variance explained was .110. For sensitivity, the most important variable
was prevalence, followed by sample size (see Figure 4.27, center pane). The MSE in the
prediction of sensitivity was .003, and the variance explained was .137. For specificity,
the most important variable was prevalence, followed by number of items, number of
item categories, and sample size. The MSE in the prediction of specificity was .002, and
the variance explained was .091.
Conclusion. The important variables to predict classification accuracy were
number of items, sample size, prevalence, and number of item categories. However, these
119
results can only be expanded to conditions with a diagnosis-test correlation of .70. Also,
the classification accuracy indices were close to those of the data-generating model.
Comparing classification accuracy across machine learning models with
ROC classifier. For comparing lasso logistic regression and the relaxed lasso logistic
regression, only conditions with a diagnosis-test correlation of .70 were analyzed. There
were no practical differences in classification accuracy between lasso logistic regression
and relaxed lasso logistic regression (classification rate R2=.004; sensitivity R2=.004;
specificity R2=.004). Therefore, only estimates from the relaxed lasso are used in further
analysis.
There were differences between relaxed lasso logistic regression and logistic
regression on classification rates (R2=.122), sensitivity (R2=.348), and specificity
(R2=.111) as a function of simulation factors. In the prediction of differences in
classification rate, there was a significant three-way interaction between sample size,
prevalence, and number of items (b=-.074, s.e. = .007, t=-11.208, p<.001, partial-
η2=.041). On average, logistic regression had higher classification rates in conditions
with low prevalence, small sample sizes, and small number of items, and relaxed lasso
logistic regression had higher classification rates in conditions with high prevalence,
larger sample sizes, and large number of items. As the number of items increased, the
difference in classification rate increased in favor of relaxed lasso logistic regression
faster for conditions with larger sample size than for conditions with smaller sample size.
Also, as the number of items increased, the difference in classification rate favored
relaxed lasso logistic regression for conditions with sample sizes of 500 or 1,000, but the
difference in classification rate favored logistic regression in conditions with sample sizes
120
of 250. In the prediction of differences in sensitivity, there was a significant three-way
interaction between number of items, sample size, and prevalence (b=.155, s.e. = .010,
t=15.315, p<.001, partial-η2=.085). On average, differences in sensitivity approached
zero as sample size and prevalence increased, and difference increased in favor of the
relaxed lasso as the number of items increased. As the number of items increased,
differences in sensitivity in favor of the relaxed lasso increased at a faster rate for
conditions with a sample size of 250 and prevalence of 5% than for any other sample size
or prevalence. In the prediction of differences in specificity, there was a significant three-
way interaction between sample size, prevalence, and number of items (b=-.086, s.e. =
.008, t=-10.623, p<.001, partial-η2=.037). On average, the difference in specificity
approached zero as prevalence and sample size increased. Also, specificity was greater
for logistic regression in conditions with 10 items, but specificity was greater for relaxed
lasso logistic regression in conditions with 30 items. As the number of items increased,
the difference in specificity approached zero for conditions with prevalence of .10 or .20,
but the difference increased in favor of logistic regression in conditions with prevalence
of .05. Similarly, as the number of items increased, the difference in specificity
approached zero for conditions with a sample size of 500 and 1,000, but the difference in
specificity increased in favor of logistic regression in conditions with sample size of 250.
Also, there were differences between relaxed lasso logistic regression and random
forest on classification rates (R2=.120), sensitivity (R2=.393), and specificity (R2=.111).
In the prediction of differences in classification rate, there was a significant three-way
interaction between number of item categories, prevalence, and number of items (b=-
.074, s.e. = .007, t=-11.208, p<.001, partial-η2=.025). On average, relaxed lasso had
121
higher classification rates in conditions with five-category items, 10% or 20%
prevalence, or 10 items, while random forest had higher classification rates in conditions
with two-category items, 5% prevalence, or 30 items. As the number of items increased,
the difference in classification rates across prevalence decreased. Also, as the number of
items increased, conditions with two-category items favored the relaxed lasso, while
conditions with five-category items favored random forest. In the prediction of
differences in sensitivity, there was a significant four-way interaction between sample
size, number of item categories, prevalence, and number of items (b=-.216, s.e. = .019,
t=-11.508, p<.001, partial-η2=.010). On average, the differences in sensitivity in favor of
relaxed lasso logistic regression increased as sample size increased and as number of
items, number of item categories, and prevalence decreased. As the number of items
increased, the differences in sensitivity across prevalence, number of item categories, and
number of items decreased. In the prediction of differences in specificity, there was a
significant three-way interaction between number of item categories, prevalence, and
number of items (b=.044, s.e. = .010, t=4.192, p<.001, partial-η2=.019). On average,
relaxed lasso had higher specificity in conditions with five-category items and 10% or
20% prevalence, while random forest had higher specificity in conditions with two-
category items and 5% prevalence. Number of items did not seem to influence the
difference in specificity. As the number of items increased, the difference in specificity
across prevalence decreased. Also, as the number of items increased, conditions with
two-category items favored the relaxed lasso, while conditions with five-category items
favored random forest.
122
All of the conditions were analyzed to study the difference in classification
accuracy between logistic regression and random forest. There were differences in the
classification rate (R2=.184), sensitivity (R2=.351), and specificity (R2=.170). In the
prediction of differences in classification rate, there was a significant three-way
interaction between number of item categories, prevalence, and number of items (b=.164,
s.e. = .013, t=12.839, p<.001, partial-η2=.030) and a three-way interaction between
number of item categories, number of items, and sample size (b=-.228, s.e. = .013, t=-
17.703, p<.001, partial-η2=.013). On average, differences in classification rates in favor
of logistic regression increased as prevalence and number of items increased, and as
sample size and number of item categories decreased. As the number of items increased,
the difference in classification rates across prevalence decreased. Also, as the number of
items increased, conditions with two-category items favored logistic regression, while
conditions with five-category items favored random forest. Also, as the number of items
increased, the differences in classification rates for conditions in sample size of 250 and
500 decreased, and the differences in classification rates for conditions in sample size of
1,000 increased in favor of logistic regression. In the prediction of differences in
sensitivity, there was a significant three-way interaction between number of item
categories, prevalence, and number of items (b=-.197, s.e. = .018, t=-10.780, p<.001,
partial-η2=.030) and a three-way interaction between number of item categories, number
of items, and sample size (b=.287, s.e. = .018, t=15.616, p<.001, partial-η2=.021). On
average, differences in sensitivity in favor of logistic regression increased as sample size
increased, and decreased as number of items, number of item categories, and prevalence
increased. As the number of items increased, the differences in sensitivity favoring
123
logistic regression across prevalence, number of item categories, and sample size,
decreased, to the point of slightly favoring random forest. In the prediction of differences
in specificity, there was a significant three-way interaction between number of item
categories, prevalence, and number of items (b=.193, s.e. = .016, t=12.143, p<.001,
partial-η2=.023) and a three-way interaction between number of item categories, number
of items, and sample size (b=-.254, s.e. = .016, t=-15.902, p<.001, partial-η2=.013). On
average, differences in specificity in favor of logistic regression increased as number of
items and prevalence increased, and decreased as sample size and prevalence increased.
As the number of items increased, differences specificity across prevalence and number
of item categories decreased. Also, as the number of items increased, the differences in
specificity for conditions in sample size of 250 and 500 decreased, and the differences in
specificity for conditions in sample size of 1,000 increased in favor of logistic regression.
6. Model Comparison across Psychometric and Machine Learning Methods.
Results from section 4 suggest that machine learning models that reduce
prediction error (using a Bayes classifier) have inflated specificity and very low
sensitivity compared to the data-generating model. Given the poor performance, machine
learning models with a Bayes classifier are not compared to data-generating theta.
Results from section 2 suggest that there is not a significant difference between using the
data-generated theta, estimated theta, or a raw summed score for classification. Thus, this
section compares the differences in classification accuracy between the data-generating
theta and the machine learning methods with ROC classifiers.
Random Forest with ROC classifier vs. Data-generating model. The vast
majority of conditions had higher classification accuracy for the data-generating model
124
than the random forest model. Table 6.1 and 6.2 show that the difference between models
ranged between -.09 to .01; Table 6.3 and 6.4 show that sensitivity ranged from -.30 to
.02; and Table 6.5 and 6.6 show that specificity ranged from -.11 to .03.
In a regression predicting the difference in classification rate between the data-
generating thetas and random forest with a ROC classifier from simulation factors, the
variance explained was R2=.156. There were three significant interactions: the interaction
between prevalence and number of item categories (b=.139, s.e. = .010, t=14.136,
p<.001, partial-η2= .012), the interaction of number of item categories and number of
items (b=.186, s.e. = .010, t=18.906, p<.001, partial-η2= .038), and the interaction of
prevalence and number of items (b=.176, s.e. = .010, t=17.910, p<.001, partial-η2=
.019). On average, the difference in classification rates increased as the number of items
increased and number of item categories decreased. Also, the difference in classification
rates was larger for a prevalence rate of .10 than prevalence rates of .05 and .20. For
binary items, the difference in classification rates across method was greater for
conditions with 30 items than with 10 items, but the opposite pattern was found for
polytomous items. Also, as prevalence increased, the difference in classification rates
decreased for conditions with 30 items, but increased in conditions with 10 items. In a
regression predicting the difference in sensitivity between the data-generating thetas and
random forest with a ROC classifier from simulation factors, the variance explained was
R2=.270. There was a significant three-way interaction between prevalence, number of
items, and number of item categories (b=.226, s.e. = .002, t=11.405, p<.001, partial-η2=
.032), and a three-way interaction between sample size, number of items, and number of
categories (b=-.230, s.e. = .020, t=-11.516, p<.001, partial-η2= .011). On average,
125
differences in sensitivity rates increased as sample size increased, and differences in
sensitivity deceased as prevalence, number of items, and number of item categories
increased. For conditions with 10 items, the difference in sensitivity increased as sample
size increased, and the difference in sensitivity decreased as prevalence and number of
categories increased. On the other hand, the difference of sensitivity in conditions with 30
items did not seem to be influenced by predictors. In a regression predicting the
difference in specificity between the data-generating thetas and random forest with a
ROC classifier from simulation factors, the variance explained was R2=.140. There was a
significant three-way interaction between prevalence, number of items, and number of
item categories (b=-.203, s.e. = .018, t=-11.583, p<.001, partial-η2= .020). On average,
the difference in specificity increased as number of item categories increased, and it was
not influenced by the number of items. Also, the difference in specificity was larger for
conditions with a prevalence rate of .10 than prevalence rates of .05 and .20. However,
for conditions with five-category items, differences in specificity decreased as the
number of items and prevalence increased. On the other hand, for conditions with two-
category items, differences in specificity increased as the number of items and prevalence
increased.
Logistic Regression with ROC classifier vs. Data-generating model. The vast
majority of conditions had higher classification rate for the data-generating model than
the random forest model. Table 6.7 and 6.8 show that the difference between models
ranged between -.09 to .01; Table 6.9 and 6.10 show that sensitivity ranged from -.30 to
.02; and Table 6.11 and 6.12 show that specificity ranged from -.11 to .03.
126
In the regression predicting the difference in classification rate from simulation
factors, the variance explained was R2=.067. None of the predictors had a partial-η2 > .01.
In the regression predicting the difference in sensitivity from the simulation factors, the
variance explained was R2=.037. None of the predictors had a partial-η2 > .01. Finally, in
the regression predicting the difference in sensitivity from the simulation factors, the
variance explained was R2=.028. None of the predictors had a partial-η2 > .01.
Lasso Logistic Regression with ROC classifier vs. Data-generating model.
Conditions examined had a diagnosis-test correlation of .70. The vast majority of
conditions had higher classification rate for the data-generating model than the random
forest model. Table 6.13 shows that the difference between models ranged between -.10
to -.02; Table 6.14 shows that sensitivity ranged from -.04 to .01; and Table 6.15 shows
that specificity ranged from -.11 to -.02.
In the regression predicting the difference in classification rate from the
diagnosis-test correlation, prevalence, sample size, number of items, number of item
categories, and local dependence, the variance explained was R2=.085. Differences in
classification rate decreased as sample size (b=.022, s.e. = .005, t=4.772, p<.001, partial-
η2= .029), prevalence (b=.034, s.e. = .005, t=7.204, p<.001, partial-η2= .042), and
number of items (b=.008, s.e. = .005, t=1.592, p=.111, partial-η2= .012) increased. In the
regression predicting the difference in sensitivity from the simulation factors, the
variance explained was R2=.013. None of the predictors had a partial-η2 > .01. Finally, in
the regression predicting the difference in specificity from the simulation factors, the
variance explained was R2=.058. Differences in specificity decreased as sample size
127
(b=.022, s.e. = .005, t=4.772, p<.001, partial-η2= .021) and prevalence (b=.022, s.e. =
.005, t=4.772, p<.001, partial-η2= .026) increased.
Relaxed Lasso Logistic Regression with ROC classifier vs. Data-generating
model. Conditions examined had a diagnosis-test correlation of .70. The vast majority of
conditions had higher classification rate for the data-generating model than the random
forest model. Table 6.16 shows that the difference between models ranged between -.10
to .01; Table 6.17 shows that sensitivity ranged from -.04 to .01; and Table 6.18 show
that specificity ranged from -.10 to -.02.
In the regression predicting the difference in classification rate from the
diagnosis-test correlation, prevalence, sample size, number of items, number of item
categories, and local dependence, the variance explained was R2=.082. Differences in
classification rate decreased as sample size (b=.023, s.e. = .005, t=4.964, p<.001, partial-
η2= .025), prevalence (b=.036, s.e. = .005, t=7.583, p<.001, partial-η2= .036), and
number of items (b=.011, s.e. = .005, t=2.160, p=.031, partial-η2= .015) increased. None
of the predictors had a partial-η2 > .01. In the regression predicting the difference in
sensitivity from the simulation factors, the variance explained was R2=.055. None of the
predictors had a partial-η2 > .01. Finally, in the regression predicting the difference in
specificity from the simulation factors, the variance explained was R2=.028. Differences
in specificity decreased as sample size (b=.024, s.e. = .006, t=4.103, p<.001, partial-η2=
.018) and prevalence (b=.034, s.e. = .006, t=5.719, p<.001, partial-η2= .022) increased.
7. Scoring Machine Learning Items for Person Parameter Recovery
The CART algorithm and lasso logistic regression select the most important items
to predict a diagnosis. This section investigates the recovery of theta if the remaining
128
items in these models were to be scored using the estimated parameters from the IRT
model in section 2. An IRT EAP[θ] score was estimated per model if at least two items
were chosen by the algorithms. On the other hand, the random forest algorithm does not
do variable selection, but ranks predictors in terms of variable importance. In this case,
half of the items with the highest variable importance were used to estimate an IRT
EAP[θ] score.
Theta Recovery by CART. Only conditions with prevalence greater than 5% and
a sample size greater than 250 were analyzed.
Linear Regression. In regression predicting theta MSE from simulation factors,
the variance explained was R2=.761. There was a significant three-way interaction
between diagnosis-test correlation, number of items, and number of item categories (b=-
.074, s.e. = .007, t=-9.964, p<.001, partial-η2= .018), an interaction between diagnosis-
test correlation and prevalence (b=-.022, s.e. = .005, t=-4.379, p<.001, partial-η2= .012),
and an interaction between diagnosis-test correlation and sample size (b=-.056, s.e. =
.005, t=-11.251, p<.001, partial-η2= .053). On average, theta MSE decreased as number
of item categories, prevalence, and sample size increased, and theta MSE increased as the
diagnosis-test correlation and the number of items increased. When the diagnosis-test
correlation was .70, theta MSE was higher for conditions with 30 items, prevalence of
.10, or sample size of 500 than for conditions with either 10 items, prevalence of .20, or
sample size of 1,000. Similarly, in a regression predicting the correlation between the true
and estimated theta from the simulation factors, the variance explained was R2=.761.
There was a significant three-way interaction between diagnosis-test correlation, number
of items, and number of item categories (b=.074, s.e. = .009, t=8.478, p<.001, partial-
129
η2= .014), a two-way interaction between diagnosis-test correlation and prevalence
(b=.029, s.e. = .006, t=4.918, p<.001, partial-η2= .011), and a diagnosis-test correlation
and sample size (b=.071, s.e. = .006, t=12.318, p<.001, partial-η2= .052). On average,
the correlation between true and estimated theta increased as number of item categories,
prevalence and sample size increase, and as the diagnosis-test correlation and number of
items decreased. When diagnosis-test correlation was .70, the correlation between true
and estimated theta was higher for conditions with either a sample size of 1,000, 30
items, 10% prevalence, or five-category items than for conditions with a sample size of
500, 10 items, 20% prevalence, or two-category items.
Regression Trees. The regression trees grown in a training dataset to predict theta
MSE and the correlation between true and estimated theta are presented in Figures 7.1
and 7.2. For theta MSE, the first split was diagnosis-test correlation. Conditions with a
diagnosis-test correlation of .30 or .50 had a predicted theta MSE of .005. For conditions
with a diagnosis-test correlation of .70, predicted theta MSE depended on number of
items and sample size, where conditions with 30 items had a higher predicted MSE than
conditions with 10 items. Similarly, for the correlation between true and estimated theta,
the first split was diagnosis-test correlation. Conditions with diagnosis-test correlation of
.30 or .50, the predicted correlation between true and estimated theta was .99. For
conditions with a diagnosis-test correlation of .70, the predicted correlation between true
and estimated thetas depended on number of items and sample size, where conditions
with 10 items had a higher correlation between true and estimated thetas than conditions
with 30 items. In a random forest model grown in a training dataset to predict theta MSE
for CART, the most important predictor was diagnosis-test correlation, followed by
130
number of items (see left panel of Figure 7.3). Using the random forest model to predict
theta MSE in the testing dataset, the MSE was .003, and the variance explained was .768.
Also, in the prediction of the correlation between true and estimated theta, the most
important predictor was diagnosis-test correlation, followed by number of items. The
MSE in the prediction of of the correlation between true and estimated theta was .004,
and the variance explained was .754.
Theta Recovery by Lasso Logistic Regression. Only conditions with a
diagnosis-test correlation of .70 were analyzed.
Linear Regression. In regression predicting theta MSE by sample size,
prevalence, number of item categories, number of items, and local dependence, the
variance explained was R2=.491. There was a significant interaction between prevalence
and number of item categories (b=.014, s.e. = .007, t=1.998, p=.045, partial-η2= .011),
and main effects of sample size (b=-.077, s.e. = .005, t=-15.477, p<.001, partial-η2=
.292) and number of items (b=.080, s.e. = .005, t=14.581, p<.001, partial-η2= .056). On
average, theta MSE increased as the number of items increased, and decreased as the
sample size, prevalence, and number of item categories increased. Also, as prevalence
increased, the differences in theta MSE across number of item categories decreased.
Similarly, in regression predicting the correlation between true and estimated theta by
sample size, prevalence, number of item categories, number of items, and local
dependence, the variance explained was R2=.498. There was a significant interaction
between prevalence and number of item categories (b=.022, s.e. = .008, t=2.738, p=.006,
partial-η2= .017), and main effects of sample size categories (b=.102, s.e. = .006,
t=17.909, p<.001, partial-η2= .297) and number of items (b=-.050, s.e. = .006, t=-8.061,
131
p<.001, partial-η2= .013). On average, the correlation between true and estimated theta
increased as sample size, prevalence, and number of item categories increased, and
decreased as the number of items increased. Also, as prevalence increased, the
differences in the correlation between true and estimated theta across number of items
decreased.
Regression Trees. The regression trees grown in a training dataset to predict theta
MSE and the correlation between true and estimated theta are presented in Figures 7.4
and 7.5. For theta MSE, the first split was on sample size. For conditions with a sample
size of 250, theta MSE depended on the number of item categories and prevalence, where
conditions with five-category items and prevalence of 20% had a lower predicted MSE
than conditions with two-category items or conditions with prevalence of 10%. For
conditions with sample size of 500 or 1,000, theta MSE depended on prevalence, number
of item categories, and sample size, where conditions with high sample size and high
prevalence led to conditions with lower MSE. Similarly, for the correlation between true
and estimated theta, the first split was sample size. For conditions with a sample size of
250, the correlation between true and estimated theta depended on the number of item
categories and prevalence, where conditions with five-category items and higher
prevalence led to higher predicted correlation between true and estimated theta. For
conditions with a sample size of 500 or 1,000, correlation between true and estimated
theta depended on prevalence and number of item categories, where conditions with
higher prevalence or conditions with five-category items had higher predicted correlation
between true and estimated theta. In a random forest model grown in a training dataset to
predict theta MSE for lasso logistic regression, the most important predictors were
132
sample size and prevalence, followed by number of item categories and number of items
(see left panel of Figure 7.6). Using the random forest model to predict theta MSE in the
testing dataset, the MSE was .005, and the variance explained was .476. Also, in the
prediction of the correlation between true and estimated theta, the most important
predictor sample size and prevalence, followed by number of item categories and number
of items (see right panel of Figure 7.6). The MSE in the prediction of the correlation
between true and estimated theta was .007, and the variance explained was .481.
Theta Recovery by Random Forest. All conditions were analyzed.
Linear Regression. In regression predicting theta MSE by sample size,
prevalence, diagnosis-test correlation, number of item categories, number of items, and
local dependence, the variance explained was R2=.713. There was a significant
interaction between prevalence and diagnosis-test correlation (b=-.019, s.e. = .002, t=-
10.325, p<.001, partial-η2= .017), and main effects of number of items (b=-.060, s.e. =
.001, t=-46.983, p<.001, partial-η2= .627) and number of item categories (b=-.036, s.e. =
.001, t=-28.146, p<.001, partial-η2= .404). On average, theta MSE increased as the
diagnosis-test correlation increased, and decreased as number of items, prevalence, and
number of item categories increased. Also, as diagnosis-test correlation increased, the
differences in theta MSE across prevalence increased. Similarly, in regression predicting
the correlation between true and estimated theta by simulation factors, the variance
explained was R2=.771. There was a significant interaction between prevalence and
diagnosis-test correlation (b=.014, s.e. = .007, t=1.998, p=.045, partial-η2= .015), and a
significant interaction between number of items and number of item categories (b=-.077,
s.e. = .005, t=-15.477, p<.001, partial-η2= .034). On average, the correlation between
133
true and estimated theta increased as the number of items, prevalence, and number of
item categories increased, and decreased as the diagnosis-test correlation increased. As
diagnosis-test correlation increased, the differences in the correlation between true and
estimated theta across prevalence increased. Also, as the number of items increased, the
difference in the correlation between true and estimated theta across number of item
categories increased.
Regression Trees. The regression trees grown in a training dataset to predict theta
MSE and the correlation between true and estimated theta are presented in Figures 7.7
and 7.8. For theta MSE, the first split was on the number of items, followed by number of
item categories. Conditions with higher number of items and higher number of item
categories had a lower predicted MSE than conditions with lower number of items and
lower number of item categories. Similarly, for the correlation between true and
estimated theta, the first split was number of items, followed by number of item
categories. Conditions with higher number of items and higher number of item categories
had a higher predicted correlation between true and estimated thetas than conditions with
lower number of items and lower number of item categories. In a random forest model
grown in a training dataset to predict theta MSE for random forest, the most important
predictor was number of items, followed by number of item categories (see left panel of
Figure 7.9). Using the random forest model to predict theta MSE for random forests in
the testing dataset, the MSE was .001, and the variance explained was .710. Also, in the
prediction of the correlation between true and estimated theta, the most important
predictor was number of items, followed by number of item categories (see right panel of
134
Figure 7.9). The MSE in the prediction of of the correlation between true and estimated
theta was .001, and the variance explained was .767.
135
Table 1.1
IRT nonconvergence (out of 500) in the 10-item condition
Item Categories
2 5
Prevalence
.05 .10 .20 .05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 1 0 1 1 1 1 0 0 2 0 0 0
250 .5 4 2 3 1 1 0 0 0 0 0 0 1
250 .7 1 1 4 1 2 3 0 2 0 0 0 1
500 .3 0 0 0 0 0 2 0 0 0 0 0 0
500 .5 0 0 0 0 0 0 0 0 0 0 0 0
500 .7 0 0 0 0 1 1 0 0 0 0 0 0
1000 .3 0 0 1 1 0 0 0 2 0 0 1 1
1000 .5 0 0 0 0 0 0 0 1 0 0 0 0
1000 .7 0 0 0 0 0 0 0 1 0 0 1 0
Note: N=training sample size; r=diagnosis-test correlation
136
Table 1.2
IRT nonconvergence (out of 500) in the 10-item condition
Item Categories
2 5
Prevalence
.05 .10 .20 .05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 1 0 1 1 1 1 0 0 2 0 0 0
250 .5 4 2 3 1 1 0 0 0 0 0 0 1
250 .7 1 1 4 1 2 3 0 2 0 0 0 1
500 .3 0 0 0 0 0 2 0 0 0 0 0 0
500 .5 0 0 0 0 0 0 0 0 0 0 0 0
500 .7 0 0 0 0 1 1 0 0 0 0 0 0
1000 .3 0 0 1 1 0 0 0 2 0 0 1 1
1000 .5 0 0 0 0 0 0 0 1 0 0 0 0
1000 .7 0 0 0 0 0 0 0 1 0 0 1 0
Note: N=training sample size; r=diagnosis-test correlation
137
Table 1.3
Mean squared error for the theta estimate in the 10-item condition
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.226 0.227 0.227 0.229 0.228 0.229 0.133 0.135 0.133 0.135 0.132 0.135
250 .5 0.227 0.227 0.228 0.230 0.227 0.231 0.133 0.135 0.134 0.136 0.133 0.134
250 .7 0.228 0.229 0.228 0.229 0.227 0.230 0.133 0.135 0.133 0.136 0.133 0.136
500 .3 0.224 0.225 0.225 0.224 0.225 0.227 0.132 0.134 0.133 0.135 0.133 0.134
500 .5 0.223 0.226 0.223 0.225 0.224 0.226 0.133 0.135 0.133 0.135 0.133 0.134
500 .7 0.225 0.224 0.225 0.227 0.223 0.226 0.133 0.135 0.133 0.134 0.132 0.135
1000 .3 0.221 0.224 0.221 0.224 0.223 0.225 0.132 0.135 0.133 0.135 0.133 0.134
1000 .5 0.221 0.222 0.222 0.224 0.222 0.222 0.133 0.134 0.134 0.134 0.133 0.134
1000 .7 0.222 0.223 0.220 0.224 0.222 0.224 0.132 0.136 0.133 0.135 0.133 0.134
Note: N=training sample size; r=diagnosis-test correlation
138
Table 1.4
Mean squared error for the theta estimate in the 30-item condition.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.094 0.094 0.094 0.095 0.094 0.095 0.052 0.052 0.054 0.053 0.053 0.053
250 .5 0.094 0.095 0.094 0.095 0.094 0.094 0.052 0.053 0.052 0.053 0.052 0.053
250 .7 0.094 0.095 0.094 0.095 0.094 0.095 0.052 0.053 0.052 0.053 0.052 0.053
500 .3 0.090 0.091 0.091 0.092 0.091 0.091 0.050 0.051 0.050 0.051 0.050 0.051
500 .5 0.090 0.091 0.091 0.091 0.091 0.091 0.050 0.051 0.050 0.051 0.050 0.051
500 .7 0.090 0.091 0.090 0.091 0.090 0.091 0.050 0.051 0.050 0.051 0.050 0.051
1000 .3 0.088 0.090 0.089 0.090 0.089 0.090 0.049 0.050 0.049 0.050 0.050 0.050
1000 .5 0.089 0.089 0.089 0.090 0.089 0.090 0.049 0.050 0.049 0.050 0.049 0.050
1000 .7 0.089 0.090 0.089 0.090 0.089 0.089 0.050 0.050 0.049 0.050 0.049 0.050
Note: N=training sample size; r=diagnosis-test correlation
139
Table 1.5
Correlation between true theta and theta estimate in the 10-item condition.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.881 0.882 0.881 0.880 0.881 0.881 0.934 0.933 0.934 0.933 0.934 0.933
250 .5 0.881 0.881 0.881 0.880 0.881 0.880 0.933 0.932 0.933 0.933 0.933 0.933
250 .7 0.881 0.881 0.881 0.881 0.881 0.880 0.933 0.933 0.933 0.933 0.933 0.932
500 .3 0.882 0.882 0.882 0.883 0.881 0.881 0.933 0.932 0.933 0.932 0.932 0.932
500 .5 0.882 0.881 0.883 0.882 0.882 0.881 0.933 0.932 0.932 0.932 0.932 0.932
500 .7 0.882 0.882 0.882 0.881 0.883 0.881 0.933 0.931 0.932 0.932 0.933 0.932
1000 .3 0.883 0.882 0.883 0.882 0.882 0.882 0.932 0.931 0.932 0.931 0.932 0.932
1000 .5 0.883 0.883 0.882 0.882 0.883 0.883 0.931 0.932 0.931 0.931 0.932 0.931
1000 .7 0.883 0.883 0.883 0.882 0.883 0.882 0.932 0.931 0.932 0.932 0.932 0.932
Note: N=training sample size; r=diagnosis-test correlation
140
Table 1.6
Correlation between true theta and theta estimate in the 30-item condition.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.954 0.954 0.955 0.954 0.955 0.954 0.976 0.976 0.976 0.976 0.976 0.976
250 .5 0.955 0.954 0.954 0.954 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976
250 .7 0.954 0.954 0.955 0.955 0.955 0.954 0.976 0.976 0.976 0.976 0.976 0.976
500 .3 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976
500 .5 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976
500 .7 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976
1000 .3 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976
1000 .5 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976
1000 .7 0.955 0.955 0.955 0.955 0.955 0.955 0.976 0.976 0.976 0.976 0.976 0.976
Note: N=training sample size; r=diagnosis-test correlation
141
Table 1.7
Mean square error for the slopes in the 2PL model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.157 0.180 0.164 0.170 0.159 0.202 0.105 0.114 0.106 0.116 0.104 0.111
250 .5 0.161 0.170 0.155 0.181 0.161 0.176 0.103 0.112 0.106 0.113 0.102 0.116
250 .7 0.165 0.166 0.174 0.196 0.160 0.199 0.105 0.114 0.105 0.116 0.104 0.116
500 .3 0.076 0.091 0.073 0.098 0.070 0.099 0.049 0.061 0.049 0.059 0.049 0.057
500 .5 0.070 0.107 0.068 0.097 0.077 0.091 0.049 0.060 0.048 0.062 0.049 0.059
500 .7 0.069 0.097 0.077 0.102 0.077 0.107 0.049 0.061 0.049 0.058 0.049 0.062
1000 .3 0.036 0.059 0.034 0.055 0.035 0.065 0.024 0.036 0.024 0.037 0.024 0.034
1000 .5 0.034 0.059 0.033 0.072 0.033 0.053 0.024 0.035 0.023 0.036 0.024 0.035
1000 .7 0.034 0.053 0.032 0.059 0.033 0.058 0.024 0.035 0.024 0.035 0.024 0.034
Note: N=training sample size; r=diagnosis-test correlation
142
Table 1.8
Variance explained for the slopes in the 2PL Model.
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.439 0.395 0.455 0.440 0.447 0.411 0.523 0.470 0.510 0.475 0.511 0.475
250 .5 0.436 0.439 0.443 0.407 0.454 0.423 0.511 0.485 0.516 0.469 0.516 0.469
250 .7 0.462 0.434 0.449 0.401 0.462 0.427 0.516 0.473 0.521 0.469 0.520 0.477
500 .3 0.610 0.544 0.608 0.526 0.606 0.536 0.680 0.611 0.678 0.612 0.671 0.627
500 .5 0.612 0.535 0.606 0.536 0.601 0.527 0.673 0.607 0.680 0.604 0.684 0.615
500 .7 0.596 0.525 0.602 0.546 0.584 0.534 0.680 0.605 0.670 0.618 0.676 0.620
1000 .3 0.731 0.673 0.735 0.652 0.740 0.650 0.804 0.715 0.808 0.719 0.804 0.726
1000 .5 0.737 0.665 0.745 0.639 0.746 0.647 0.800 0.718 0.808 0.720 0.808 0.724
1000 .7 0.746 0.666 0.751 0.656 0.744 0.660 0.799 0.718 0.808 0.724 0.805 0.727
Note: N=training sample size; r=diagnosis-test correlation
143
Table 1.9
Mean square error for the threshold parameter in the 2PL Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.045 0.161 0.043 0.047 0.041 0.057 0.034 0.040 0.031 0.047 0.032 0.043
250 .5 0.039 0.046 0.040 0.082 0.039 0.053 0.035 0.044 0.033 0.050 0.032 0.046
250 .7 0.038 0.048 0.040 0.051 0.050 0.052 0.035 0.052 0.031 0.043 0.034 0.044
500 .3 0.020 0.031 0.018 0.036 0.020 0.034 0.016 0.029 0.017 0.033 0.019 0.029
500 .5 0.021 0.048 0.019 0.033 0.020 0.030 0.017 0.030 0.018 0.029 0.016 0.029
500 .7 0.020 0.035 0.019 0.034 0.017 0.037 0.017 0.028 0.016 0.027 0.016 0.030
1000 .3 0.009 0.021 0.011 0.024 0.010 0.025 0.008 0.022 0.008 0.022 0.008 0.021
1000 .5 0.010 0.021 0.010 0.025 0.010 0.023 0.008 0.021 0.008 0.020 0.008 0.022
1000 .7 0.009 0.022 0.008 0.023 0.010 0.024 0.008 0.020 0.008 0.021 0.009 0.020
Note: N=training sample size; r=diagnosis-test correlation
144
Table 1.10
Variance explained for the threshold parameter in the 2PL Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.976 0.966 0.976 0.965 0.976 0.962 0.976 0.967 0.976 0.962 0.976 0.964
250 .5 0.977 0.968 0.976 0.958 0.976 0.961 0.975 0.966 0.976 0.962 0.977 0.961
250 .7 0.976 0.964 0.975 0.962 0.976 0.963 0.975 0.965 0.976 0.964 0.976 0.964
500 .3 0.987 0.975 0.989 0.973 0.988 0.974 0.988 0.976 0.988 0.974 0.987 0.976
500 .5 0.987 0.975 0.989 0.976 0.988 0.975 0.988 0.975 0.988 0.975 0.988 0.975
500 .7 0.988 0.974 0.988 0.974 0.988 0.976 0.988 0.976 0.988 0.977 0.989 0.975
1000 .3 0.994 0.982 0.994 0.982 0.994 0.981 0.994 0.981 0.994 0.981 0.994 0.981
1000 .5 0.994 0.982 0.994 0.980 0.994 0.980 0.994 0.982 0.994 0.982 0.994 0.981
1000 .7 0.994 0.982 0.994 0.981 0.994 0.980 0.994 0.982 0.994 0.981 0.994 0.982
Note: N=training sample size; r=diagnosis-test correlation
145
Table 1.11
Mean square error for the slopes in the Graded Response Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.053 0.064 0.052 0.067 0.053 0.067 0.041 0.050 0.041 0.053 0.042 0.052
250 .5 0.051 0.067 0.053 0.067 0.052 0.066 0.041 0.050 0.042 0.052 0.041 0.051
250 .7 0.054 0.065 0.054 0.066 0.052 0.067 0.043 0.051 0.042 0.053 0.042 0.052
500 .3 0.025 0.039 0.026 0.040 0.027 0.042 0.021 0.032 0.021 0.032 0.021 0.031
500 .5 0.027 0.041 0.026 0.040 0.026 0.042 0.021 0.033 0.021 0.032 0.021 0.033
500 .7 0.027 0.042 0.026 0.040 0.027 0.039 0.021 0.032 0.021 0.031 0.021 0.033
1000 .3 0.013 0.027 0.013 0.024 0.013 0.024 0.011 0.021 0.011 0.023 0.011 0.022
1000 .5 0.013 0.032 0.013 0.027 0.013 0.026 0.011 0.023 0.011 0.022 0.011 0.020
1000 .7 0.013 0.037 0.013 0.027 0.013 0.029 0.011 0.021 0.011 0.022 0.011 0.021
Note: N=training sample size; r=diagnosis-test correlation
146
Table 1.12
Variance explained for the slopes in the Graded Response Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.657 0.600 0.680 0.579 0.660 0.586 0.724 0.661 0.726 0.645 0.721 0.653
250 .5 0.671 0.608 0.664 0.601 0.676 0.585 0.726 0.663 0.723 0.649 0.723 0.647
250 .7 0.663 0.606 0.662 0.600 0.663 0.587 0.719 0.665 0.726 0.642 0.724 0.650
500 .3 0.782 0.714 0.790 0.706 0.785 0.714 0.831 0.743 0.836 0.747 0.831 0.749
500 .5 0.782 0.702 0.790 0.695 0.791 0.689 0.830 0.741 0.831 0.746 0.833 0.744
500 .7 0.784 0.687 0.781 0.718 0.782 0.702 0.834 0.747 0.836 0.752 0.833 0.741
1000 .3 0.880 0.793 0.875 0.794 0.872 0.806 0.904 0.818 0.904 0.802 0.907 0.810
1000 .5 0.876 0.780 0.873 0.785 0.877 0.781 0.905 0.809 0.904 0.810 0.905 0.821
1000 .7 0.875 0.792 0.873 0.783 0.880 0.776 0.899 0.815 0.903 0.805 0.904 0.809
Note: N=training sample size; r=diagnosis-test correlation
147
Table 1.13
Mean square error for the first threshold in the Graded Response Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.026 0.039 0.028 0.041 0.026 0.042 0.025 0.037 0.027 0.040 0.027 0.038
250 .5 0.027 0.039 0.028 0.043 0.028 0.042 0.024 0.036 0.026 0.040 0.026 0.040
250 .7 0.027 0.052 0.028 0.042 0.029 0.042 0.026 0.036 0.026 0.041 0.025 0.040
500 .3 0.013 0.029 0.014 0.029 0.013 0.029 0.013 0.026 0.013 0.027 0.013 0.025
500 .5 0.014 0.028 0.014 0.027 0.014 0.031 0.013 0.027 0.013 0.026 0.013 0.026
500 .7 0.014 0.028 0.015 0.026 0.014 0.027 0.013 0.027 0.014 0.026 0.013 0.026
1000 .3 0.007 0.021 0.007 0.021 0.007 0.020 0.007 0.018 0.007 0.020 0.007 0.020
1000 .5 0.008 0.020 0.007 0.019 0.007 0.020 0.007 0.019 0.007 0.021 0.007 0.018
1000 .7 0.008 0.021 0.007 0.021 0.007 0.021 0.007 0.019 0.007 0.020 0.007 0.020
Note: N=training sample size; r=diagnosis-test correlation
148
Table 1.14
Mean square error for the second threshold in the Graded Response Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.019 0.032 0.019 0.034 0.018 0.034 0.019 0.030 0.019 0.032 0.019 0.031
250 .5 0.020 0.031 0.020 0.034 0.019 0.032 0.017 0.029 0.019 0.032 0.019 0.033
250 .7 0.018 0.037 0.020 0.033 0.020 0.034 0.018 0.029 0.018 0.033 0.018 0.032
500 .3 0.011 0.025 0.010 0.025 0.011 0.023 0.010 0.023 0.010 0.023 0.010 0.022
500 .5 0.011 0.025 0.010 0.023 0.011 0.027 0.010 0.023 0.010 0.023 0.010 0.023
500 .7 0.011 0.024 0.011 0.022 0.011 0.024 0.010 0.023 0.010 0.022 0.010 0.023
1000 .3 0.005 0.020 0.006 0.019 0.006 0.018 0.005 0.016 0.005 0.018 0.005 0.018
1000 .5 0.006 0.018 0.006 0.018 0.006 0.018 0.006 0.018 0.005 0.019 0.006 0.017
1000 .7 0.006 0.020 0.006 0.019 0.006 0.019 0.005 0.017 0.006 0.018 0.006 0.018
Note: N=training sample size; r=diagnosis-test correlation
149
Table 1.15
Mean square error for the third threshold in the Graded Response Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.023 0.038 0.023 0.040 0.023 0.041 0.023 0.034 0.022 0.036 0.023 0.036
250 .5 0.023 0.037 0.023 0.039 0.023 0.038 0.021 0.034 0.023 0.037 0.023 0.038
250 .7 0.022 0.037 0.024 0.039 0.024 0.039 0.021 0.033 0.021 0.037 0.022 0.036
500 .3 0.014 0.030 0.013 0.029 0.014 0.028 0.013 0.027 0.013 0.026 0.013 0.026
500 .5 0.014 0.030 0.013 0.026 0.013 0.031 0.013 0.027 0.013 0.026 0.013 0.026
500 .7 0.015 0.028 0.014 0.027 0.014 0.028 0.013 0.027 0.013 0.025 0.013 0.026
1000 .3 0.007 0.022 0.008 0.022 0.008 0.021 0.007 0.019 0.007 0.020 0.007 0.020
1000 .5 0.008 0.022 0.008 0.020 0.008 0.021 0.007 0.020 0.007 0.021 0.007 0.019
1000 .7 0.008 0.024 0.008 0.022 0.007 0.022 0.007 0.020 0.007 0.021 0.007 0.020
Note: N=training sample size; r=diagnosis-test correlation
150
Table 1.16
Mean square error for the fourth threshold in the Graded Response Model.
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.041 0.058 0.042 0.061 0.040 0.062 0.038 0.052 0.039 0.054 0.039 0.055
250 .5 0.040 0.057 0.041 0.059 0.039 0.054 0.036 0.052 0.040 0.055 0.040 0.056
250 .7 0.039 0.057 0.042 0.061 0.040 0.062 0.036 0.050 0.037 0.054 0.038 0.053
500 .3 0.024 0.045 0.025 0.042 0.024 0.041 0.023 0.038 0.023 0.038 0.023 0.037
500 .5 0.025 0.044 0.025 0.038 0.024 0.044 0.023 0.038 0.023 0.037 0.023 0.038
500 .7 0.026 0.041 0.026 0.040 0.024 0.040 0.023 0.040 0.023 0.037 0.023 0.038
1000 .3 0.013 0.030 0.014 0.031 0.014 0.029 0.013 0.026 0.013 0.028 0.013 0.028
1000 .5 0.015 0.031 0.014 0.029 0.014 0.028 0.013 0.027 0.013 0.029 0.013 0.026
1000 .7 0.014 0.035 0.014 0.030 0.013 0.030 0.013 0.027 0.013 0.028 0.013 0.027
Note: N=training sample size; r=diagnosis-test correlation
151
Table 1.17
Correlation between true and estimated first threshold in the Graded Response Model.
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.966 0.951 0.967 0.945 0.966 0.946 0.968 0.950 0.968 0.946 0.968 0.948
250 .5 0.967 0.948 0.967 0.947 0.964 0.941 0.969 0.951 0.968 0.945 0.968 0.945
250 .7 0.967 0.948 0.966 0.945 0.965 0.942 0.968 0.952 0.967 0.947 0.968 0.945
500 .3 0.988 0.969 0.987 0.969 0.988 0.970 0.988 0.971 0.987 0.971 0.988 0.972
500 .5 0.987 0.971 0.987 0.971 0.987 0.968 0.988 0.971 0.988 0.971 0.988 0.971
500 .7 0.987 0.970 0.986 0.972 0.987 0.972 0.987 0.971 0.987 0.971 0.988 0.971
1000 .3 0.994 0.979 0.994 0.978 0.994 0.980 0.994 0.981 0.994 0.981 0.994 0.981
1000 .5 0.994 0.980 0.994 0.981 0.994 0.980 0.994 0.980 0.994 0.980 0.994 0.982
1000 .7 0.994 0.979 0.994 0.980 0.994 0.979 0.994 0.981 0.994 0.980 0.995 0.980
Note: N=training sample size; r=diagnosis-test correlation
152
Table 1.18
Correlation between true and estimated second thresholds in the Graded Response Model.
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.977 0.960 0.977 0.955 0.977 0.957 0.977 0.960 0.977 0.957 0.977 0.957
250 .5 0.977 0.958 0.978 0.957 0.976 0.954 0.978 0.960 0.977 0.956 0.977 0.955
250 .7 0.978 0.960 0.977 0.957 0.976 0.953 0.978 0.962 0.978 0.958 0.977 0.956
500 .3 0.990 0.973 0.990 0.974 0.991 0.975 0.991 0.975 0.991 0.975 0.990 0.976
500 .5 0.990 0.974 0.991 0.975 0.990 0.972 0.991 0.975 0.991 0.975 0.991 0.975
500 .7 0.990 0.975 0.990 0.976 0.990 0.976 0.991 0.975 0.990 0.976 0.991 0.975
1000 .3 0.996 0.981 0.995 0.981 0.996 0.982 0.996 0.983 0.996 0.983 0.996 0.983
1000 .5 0.995 0.981 0.996 0.983 0.996 0.982 0.996 0.982 0.996 0.982 0.996 0.984
1000 .7 0.995 0.981 0.996 0.982 0.996 0.981 0.996 0.983 0.996 0.982 0.996 0.982
Note: N=training sample size; r=diagnosis-test correlation
153
Table 1.19
Correlation between true and estimated third thresholds in the Graded Response Model
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.970 0.953 0.971 0.948 0.971 0.947 0.971 0.954 0.972 0.951 0.971 0.952
250 .5 0.973 0.950 0.971 0.950 0.970 0.947 0.973 0.954 0.971 0.950 0.972 0.948
250 .7 0.971 0.953 0.972 0.949 0.971 0.946 0.973 0.956 0.974 0.952 0.972 0.951
500 .3 0.986 0.968 0.986 0.970 0.987 0.970 0.988 0.972 0.988 0.972 0.987 0.973
500 .5 0.987 0.969 0.987 0.972 0.987 0.968 0.988 0.971 0.988 0.972 0.988 0.971
500 .7 0.986 0.970 0.986 0.971 0.987 0.973 0.988 0.972 0.988 0.973 0.988 0.972
1000 .3 0.994 0.979 0.993 0.979 0.994 0.978 0.994 0.981 0.994 0.980 0.994 0.981
1000 .5 0.994 0.978 0.994 0.981 0.994 0.980 0.994 0.980 0.994 0.980 0.994 0.981
1000 .7 0.994 0.978 0.994 0.979 0.994 0.979 0.994 0.981 0.994 0.980 0.994 0.980
Note: N=training sample size; r=diagnosis-test correlation
154
Table 1.20. Correlation between true and estimated fourth thresholds in the Graded Response Model.
Number of Items
10 30
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.945 0.927 0.948 0.920 0.949 0.919 0.951 0.931 0.949 0.928 0.949 0.927
250 .5 0.950 0.924 0.947 0.922 0.948 0.923 0.953 0.931 0.949 0.926 0.950 0.925
250 .7 0.949 0.925 0.950 0.923 0.947 0.916 0.952 0.933 0.953 0.930 0.950 0.927
500 .3 0.975 0.953 0.975 0.957 0.978 0.957 0.978 0.959 0.978 0.959 0.977 0.961
500 .5 0.976 0.953 0.975 0.957 0.976 0.954 0.978 0.960 0.977 0.960 0.978 0.959
500 .7 0.973 0.953 0.974 0.957 0.976 0.961 0.977 0.959 0.978 0.961 0.978 0.960
1000 .3 0.989 0.971 0.987 0.971 0.989 0.971 0.989 0.975 0.989 0.973 0.989 0.973
1000 .5 0.988 0.969 0.988 0.973 0.989 0.972 0.989 0.974 0.989 0.973 0.988 0.975
1000 .7 0.988 0.969 0.988 0.971 0.989 0.972 0.989 0.974 0.989 0.973 0.989 0.974
Note: N=training sample size; r=diagnosis-test correlation
155
Table 2.1
Classification rate of data-generating theta in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.621 0.626 0.616 0.607 0.596 0.599 0.634 0.636 0.614 0.608 0.608 0.599
250 .5 0.711 0.712 0.679 0.689 0.675 0.669 0.706 0.710 0.681 0.684 0.669 0.666
250 .7 0.798 0.797 0.773 0.771 0.754 0.751 0.799 0.796 0.771 0.772 0.748 0.749
500 .3 0.621 0.630 0.605 0.614 0.598 0.595 0.628 0.623 0.610 0.610 0.604 0.596
500 .5 0.705 0.706 0.682 0.687 0.667 0.673 0.709 0.710 0.686 0.689 0.671 0.666
500 .7 0.798 0.799 0.766 0.769 0.752 0.751 0.797 0.794 0.767 0.766 0.752 0.750
1000 .3 0.631 0.619 0.606 0.599 0.600 0.604 0.616 0.621 0.607 0.608 0.600 0.596
1000 .5 0.705 0.701 0.686 0.689 0.672 0.670 0.694 0.698 0.686 0.682 0.667 0.670
1000 .7 0.794 0.791 0.769 0.768 0.751 0.749 0.789 0.792 0.765 0.769 0.752 0.750
Note: N=training sample size; r=diagnosis-test correlation
156
Table 2.2
Classification rate of data-generating theta in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.641 0.626 0.609 0.612 0.604 0.599 0.629 0.629 0.626 0.613 0.599 0.601
250 .5 0.705 0.705 0.688 0.693 0.668 0.667 0.717 0.716 0.681 0.693 0.667 0.665
250 .7 0.796 0.797 0.767 0.769 0.746 0.749 0.794 0.801 0.774 0.772 0.748 0.752
500 .3 0.627 0.624 0.616 0.618 0.598 0.596 0.623 0.624 0.613 0.616 0.598 0.597
500 .5 0.702 0.708 0.681 0.683 0.671 0.671 0.701 0.707 0.685 0.687 0.673 0.666
500 .7 0.790 0.790 0.768 0.772 0.749 0.749 0.796 0.794 0.768 0.771 0.750 0.752
1000 .3 0.630 0.618 0.607 0.609 0.603 0.608 0.625 0.626 0.613 0.608 0.598 0.595
1000 .5 0.701 0.701 0.681 0.686 0.672 0.673 0.709 0.697 0.678 0.683 0.669 0.670
1000 .7 0.788 0.786 0.765 0.766 0.751 0.749 0.790 0.792 0.770 0.768 0.752 0.752
Note: N=training sample size; r=diagnosis-test correlation
157
Table 2.3
Sensitivity of data generating theta in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.610 0.605 0.598 0.607 0.600 0.595 0.602 0.598 0.600 0.606 0.584 0.595
250 .5 0.696 0.694 0.703 0.687 0.669 0.681 0.699 0.700 0.698 0.698 0.683 0.687
250 .7 0.797 0.801 0.784 0.788 0.765 0.773 0.802 0.800 0.789 0.784 0.773 0.773
500 .3 0.621 0.608 0.619 0.607 0.608 0.610 0.610 0.613 0.611 0.612 0.594 0.607
500 .5 0.717 0.714 0.706 0.698 0.690 0.682 0.710 0.709 0.699 0.697 0.686 0.691
500 .7 0.811 0.806 0.806 0.802 0.776 0.776 0.807 0.815 0.801 0.802 0.776 0.780
1000 .3 0.613 0.629 0.620 0.631 0.608 0.599 0.632 0.626 0.620 0.618 0.607 0.611
1000 .5 0.718 0.723 0.707 0.705 0.687 0.691 0.733 0.728 0.707 0.711 0.694 0.692
1000 .7 0.819 0.824 0.804 0.803 0.782 0.783 0.826 0.825 0.809 0.805 0.779 0.784
Note: N=training sample size; r=diagnosis-test correlation
158
Table 2.4
Sensitivity of data-generating theta in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.590 0.604 0.605 0.602 0.592 0.599 0.603 0.603 0.605 0.600 0.599 0.594
250 .5 0.708 0.701 0.691 0.683 0.682 0.684 0.690 0.690 0.700 0.684 0.686 0.689
250 .7 0.798 0.796 0.792 0.788 0.775 0.772 0.801 0.792 0.779 0.784 0.776 0.767
500 .3 0.610 0.616 0.603 0.603 0.603 0.609 0.618 0.618 0.608 0.604 0.604 0.605
500 .5 0.719 0.710 0.708 0.705 0.685 0.684 0.713 0.711 0.699 0.702 0.682 0.694
500 .7 0.820 0.813 0.801 0.795 0.780 0.779 0.813 0.814 0.798 0.795 0.779 0.774
1000 .3 0.615 0.630 0.619 0.617 0.602 0.593 0.620 0.621 0.613 0.617 0.609 0.614
1000 .5 0.724 0.727 0.712 0.707 0.689 0.687 0.711 0.730 0.718 0.710 0.690 0.691
1000 .7 0.827 0.834 0.809 0.809 0.781 0.785 0.826 0.821 0.801 0.804 0.780 0.781
Note: N=training sample size; r=diagnosis-test correlation
159
Table 2.5
Specificity of data-generating theta in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.622 0.627 0.618 0.607 0.595 0.600 0.636 0.638 0.615 0.608 0.614 0.600
250 .5 0.711 0.713 0.676 0.690 0.677 0.666 0.707 0.711 0.679 0.682 0.665 0.661
250 .7 0.798 0.797 0.772 0.769 0.752 0.745 0.798 0.795 0.769 0.770 0.741 0.743
500 .3 0.621 0.631 0.604 0.615 0.595 0.591 0.629 0.623 0.609 0.610 0.607 0.594
500 .5 0.704 0.705 0.680 0.685 0.661 0.671 0.709 0.710 0.685 0.688 0.667 0.660
500 .7 0.798 0.798 0.761 0.765 0.746 0.744 0.797 0.793 0.763 0.762 0.746 0.743
1000 .3 0.632 0.618 0.604 0.595 0.597 0.605 0.615 0.620 0.606 0.607 0.598 0.593
1000 .5 0.704 0.700 0.684 0.687 0.668 0.665 0.692 0.696 0.684 0.679 0.661 0.664
1000 .7 0.792 0.789 0.765 0.764 0.744 0.741 0.787 0.790 0.760 0.765 0.745 0.742
Note: N=training sample size; r=diagnosis-test correlation
160
Table 2.6
Specificity of data-generating theta in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.643 0.627 0.609 0.612 0.607 0.599 0.630 0.630 0.628 0.615 0.599 0.603
250 .5 0.705 0.705 0.688 0.694 0.664 0.663 0.719 0.717 0.679 0.694 0.662 0.659
250 .7 0.796 0.797 0.765 0.767 0.739 0.744 0.794 0.802 0.773 0.771 0.741 0.749
500 .3 0.628 0.624 0.617 0.619 0.597 0.592 0.623 0.624 0.613 0.617 0.596 0.595
500 .5 0.701 0.708 0.678 0.681 0.668 0.668 0.701 0.707 0.684 0.686 0.671 0.658
500 .7 0.788 0.789 0.765 0.769 0.742 0.742 0.795 0.793 0.765 0.768 0.743 0.746
1000 .3 0.631 0.617 0.606 0.608 0.603 0.611 0.625 0.627 0.613 0.607 0.596 0.590
1000 .5 0.700 0.700 0.678 0.684 0.668 0.669 0.709 0.695 0.674 0.680 0.664 0.664
1000 .7 0.785 0.783 0.760 0.761 0.743 0.739 0.788 0.791 0.766 0.764 0.745 0.745
Note: N=training sample size; r=diagnosis-test correlation
161
Table 2.7
Classification rate of estimated thetas in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.610 0.607 0.608 0.590 0.577 0.582 0.617 0.620 0.613 0.609 0.597 0.590
250 .5 0.679 0.683 0.656 0.660 0.650 0.652 0.701 0.697 0.679 0.668 0.662 0.657
250 .7 0.756 0.750 0.732 0.726 0.718 0.715 0.779 0.782 0.753 0.748 0.731 0.732
500 .3 0.610 0.610 0.595 0.598 0.588 0.585 0.615 0.615 0.607 0.604 0.597 0.591
500 .5 0.683 0.677 0.657 0.661 0.648 0.651 0.699 0.703 0.669 0.673 0.658 0.659
500 .7 0.748 0.743 0.730 0.731 0.720 0.718 0.778 0.774 0.748 0.750 0.736 0.736
1000 .3 0.606 0.598 0.588 0.587 0.589 0.588 0.610 0.615 0.601 0.603 0.597 0.594
1000 .5 0.671 0.676 0.654 0.659 0.652 0.646 0.692 0.691 0.668 0.673 0.660 0.660
1000 .7 0.743 0.744 0.731 0.732 0.717 0.716 0.770 0.769 0.750 0.747 0.733 0.733
Note: N=training sample size; r=diagnosis-test correlation
162
Table 2.8
Classification rate of estimated thetas in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.638 0.635 0.604 0.608 0.606 0.596 0.624 0.624 0.633 0.607 0.597 0.596
250 .5 0.697 0.703 0.669 0.679 0.664 0.666 0.710 0.708 0.679 0.686 0.665 0.661
250 .7 0.778 0.784 0.754 0.754 0.735 0.739 0.789 0.796 0.767 0.765 0.744 0.743
500 .3 0.621 0.604 0.611 0.609 0.597 0.590 0.618 0.617 0.601 0.610 0.595 0.594
500 .5 0.690 0.701 0.680 0.671 0.661 0.663 0.703 0.703 0.678 0.679 0.670 0.660
500 .7 0.773 0.778 0.755 0.753 0.739 0.737 0.789 0.788 0.761 0.761 0.746 0.744
1000 .3 0.617 0.609 0.603 0.601 0.597 0.599 0.617 0.615 0.605 0.607 0.598 0.596
1000 .5 0.688 0.687 0.672 0.680 0.662 0.668 0.703 0.700 0.675 0.679 0.667 0.664
1000 .7 0.773 0.776 0.750 0.752 0.743 0.739 0.784 0.785 0.762 0.761 0.743 0.746
Note: N=training sample size; r=diagnosis-test correlation
163
Table 2.9
Sensitivity of estimated thetas in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.595 0.595 0.583 0.602 0.607 0.596 0.602 0.599 0.588 0.588 0.582 0.594
250 .5 0.685 0.678 0.685 0.679 0.660 0.657 0.676 0.688 0.674 0.691 0.666 0.672
250 .7 0.769 0.780 0.759 0.771 0.744 0.749 0.778 0.774 0.769 0.770 0.756 0.754
500 .3 0.607 0.604 0.605 0.601 0.597 0.597 0.609 0.609 0.601 0.604 0.588 0.597
500 .5 0.695 0.697 0.691 0.686 0.672 0.668 0.692 0.686 0.696 0.689 0.677 0.674
500 .7 0.790 0.792 0.776 0.773 0.750 0.750 0.790 0.793 0.782 0.779 0.755 0.756
1000 .3 0.613 0.624 0.617 0.619 0.598 0.598 0.622 0.615 0.613 0.608 0.595 0.598
1000 .5 0.710 0.704 0.701 0.695 0.670 0.682 0.706 0.709 0.701 0.696 0.678 0.680
1000 .7 0.804 0.802 0.780 0.776 0.759 0.758 0.807 0.809 0.785 0.789 0.763 0.763
Note: N=training sample size; r=diagnosis-test correlation
164
Table 2.10
Sensitivity of estimated thetas in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.581 0.585 0.602 0.595 0.576 0.593 0.604 0.604 0.589 0.604 0.595 0.597
250 .5 0.695 0.688 0.698 0.682 0.670 0.666 0.688 0.689 0.693 0.681 0.676 0.685
250 .7 0.790 0.781 0.783 0.780 0.765 0.761 0.793 0.782 0.774 0.778 0.767 0.763
500 .3 0.608 0.628 0.602 0.604 0.595 0.610 0.617 0.620 0.616 0.606 0.601 0.607
500 .5 0.717 0.701 0.692 0.702 0.685 0.679 0.701 0.709 0.699 0.703 0.678 0.692
500 .7 0.808 0.800 0.789 0.789 0.767 0.769 0.806 0.806 0.795 0.792 0.768 0.772
1000 .3 0.619 0.628 0.616 0.617 0.600 0.599 0.622 0.624 0.616 0.614 0.602 0.607
1000 .5 0.723 0.724 0.709 0.697 0.687 0.675 0.709 0.719 0.713 0.707 0.684 0.689
1000 .7 0.815 0.815 0.802 0.797 0.765 0.771 0.820 0.814 0.797 0.799 0.779 0.774
Note: N=training sample size; r=diagnosis-test correlation
165
Table 2.11
Specificity of estimated thetas in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.611 0.608 0.611 0.589 0.569 0.578 0.618 0.621 0.616 0.611 0.601 0.589
250 .5 0.678 0.684 0.653 0.658 0.648 0.651 0.703 0.697 0.680 0.665 0.661 0.653
250 .7 0.755 0.748 0.729 0.721 0.711 0.706 0.779 0.782 0.751 0.746 0.725 0.727
500 .3 0.610 0.610 0.594 0.597 0.586 0.582 0.615 0.616 0.607 0.604 0.599 0.590
500 .5 0.682 0.676 0.654 0.658 0.643 0.646 0.699 0.704 0.666 0.671 0.653 0.655
500 .7 0.746 0.740 0.725 0.726 0.712 0.709 0.777 0.773 0.744 0.747 0.731 0.731
1000 .3 0.606 0.597 0.585 0.583 0.587 0.585 0.609 0.615 0.599 0.602 0.597 0.592
1000 .5 0.669 0.675 0.648 0.655 0.647 0.637 0.692 0.690 0.664 0.670 0.655 0.655
1000 .7 0.740 0.741 0.726 0.727 0.706 0.705 0.768 0.767 0.746 0.743 0.726 0.725
Note: N=training sample size; r=diagnosis-test correlation
166
Table 2.12
Specificity of estimated thetas in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.641 0.638 0.604 0.610 0.613 0.597 0.625 0.625 0.635 0.607 0.597 0.596
250 .5 0.697 0.704 0.666 0.679 0.662 0.666 0.711 0.709 0.678 0.687 0.663 0.655
250 .7 0.778 0.784 0.750 0.751 0.728 0.733 0.789 0.797 0.767 0.764 0.738 0.738
500 .3 0.622 0.603 0.612 0.609 0.597 0.584 0.618 0.617 0.600 0.610 0.593 0.591
500 .5 0.688 0.701 0.679 0.667 0.655 0.659 0.703 0.702 0.676 0.677 0.668 0.652
500 .7 0.771 0.776 0.752 0.749 0.732 0.729 0.788 0.787 0.757 0.757 0.741 0.737
1000 .3 0.617 0.608 0.601 0.599 0.596 0.599 0.617 0.614 0.604 0.606 0.597 0.593
1000 .5 0.686 0.685 0.668 0.678 0.656 0.667 0.702 0.699 0.671 0.676 0.662 0.658
1000 .7 0.770 0.774 0.745 0.747 0.737 0.732 0.782 0.784 0.759 0.756 0.735 0.739
Note: N=training sample size; r=diagnosis-test correlation
167
Table 2.13
Classification rate of raw summed score in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.594 0.603 0.590 0.574 0.576 0.585 0.618 0.611 0.605 0.597 0.592 0.589
250 .5 0.669 0.669 0.648 0.656 0.647 0.650 0.691 0.692 0.672 0.662 0.656 0.653
250 .7 0.739 0.738 0.727 0.719 0.715 0.710 0.769 0.771 0.746 0.745 0.728 0.729
500 .3 0.599 0.598 0.591 0.595 0.586 0.584 0.615 0.614 0.607 0.599 0.592 0.593
500 .5 0.668 0.665 0.658 0.656 0.647 0.650 0.686 0.693 0.669 0.667 0.659 0.654
500 .7 0.735 0.733 0.724 0.724 0.715 0.712 0.774 0.765 0.742 0.746 0.730 0.728
1000 .3 0.602 0.590 0.584 0.587 0.586 0.584 0.607 0.608 0.602 0.600 0.594 0.590
1000 .5 0.665 0.667 0.651 0.651 0.648 0.648 0.683 0.683 0.665 0.672 0.657 0.654
1000 .7 0.739 0.737 0.725 0.727 0.716 0.714 0.762 0.764 0.745 0.745 0.729 0.730
Note: N=training sample size; r=diagnosis-test correlation
168
Table 2.14
Classification rate of raw summed score in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.627 0.624 0.600 0.594 0.603 0.591 0.628 0.628 0.627 0.604 0.599 0.591
250 .5 0.686 0.687 0.664 0.669 0.658 0.663 0.706 0.705 0.678 0.685 0.664 0.659
250 .7 0.766 0.772 0.745 0.748 0.734 0.735 0.788 0.790 0.764 0.764 0.742 0.741
500 .3 0.609 0.597 0.606 0.605 0.591 0.590 0.618 0.623 0.603 0.608 0.592 0.593
500 .5 0.680 0.693 0.676 0.667 0.660 0.661 0.703 0.700 0.678 0.679 0.668 0.660
500 .7 0.766 0.769 0.749 0.750 0.737 0.733 0.784 0.782 0.759 0.763 0.744 0.744
1000 .3 0.613 0.606 0.594 0.597 0.594 0.598 0.621 0.613 0.609 0.605 0.595 0.593
1000 .5 0.682 0.680 0.670 0.678 0.660 0.663 0.700 0.694 0.674 0.678 0.664 0.663
1000 .7 0.764 0.767 0.745 0.751 0.740 0.739 0.779 0.783 0.759 0.754 0.742 0.746
Note: N=training sample size; r=diagnosis-test correlation
169
Table 2.15
Sensitivity of raw summed score in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.611 0.600 0.604 0.619 0.610 0.591 0.602 0.607 0.594 0.600 0.589 0.591
250 .5 0.697 0.698 0.695 0.684 0.666 0.664 0.691 0.689 0.680 0.695 0.674 0.677
250 .7 0.793 0.798 0.767 0.785 0.754 0.760 0.787 0.783 0.774 0.771 0.757 0.755
500 .3 0.618 0.616 0.609 0.604 0.598 0.599 0.606 0.609 0.598 0.608 0.596 0.594
500 .5 0.711 0.712 0.690 0.693 0.674 0.669 0.707 0.695 0.693 0.694 0.673 0.680
500 .7 0.804 0.808 0.785 0.782 0.760 0.760 0.794 0.801 0.786 0.780 0.761 0.764
1000 .3 0.620 0.634 0.621 0.618 0.602 0.604 0.624 0.621 0.610 0.610 0.598 0.603
1000 .5 0.718 0.715 0.703 0.705 0.677 0.679 0.712 0.714 0.701 0.693 0.679 0.687
1000 .7 0.810 0.811 0.788 0.785 0.761 0.761 0.813 0.810 0.788 0.787 0.763 0.765
Note: N=training sample size; r=diagnosis-test correlation
170
Table 2.16
Sensitivity of raw summed score in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.598 0.598 0.607 0.616 0.581 0.603 0.598 0.598 0.598 0.607 0.591 0.604
250 .5 0.708 0.706 0.703 0.697 0.681 0.673 0.690 0.690 0.694 0.682 0.678 0.690
250 .7 0.809 0.799 0.794 0.788 0.769 0.768 0.792 0.790 0.777 0.778 0.769 0.766
500 .3 0.621 0.633 0.607 0.609 0.605 0.609 0.616 0.614 0.613 0.608 0.605 0.606
500 .5 0.728 0.711 0.698 0.708 0.685 0.683 0.701 0.710 0.698 0.700 0.681 0.691
500 .7 0.816 0.811 0.798 0.795 0.770 0.776 0.810 0.812 0.796 0.787 0.770 0.769
1000 .3 0.623 0.631 0.625 0.620 0.605 0.600 0.615 0.625 0.611 0.617 0.606 0.610
1000 .5 0.728 0.732 0.710 0.700 0.690 0.683 0.712 0.725 0.712 0.706 0.686 0.689
1000 .7 0.825 0.825 0.808 0.799 0.769 0.771 0.824 0.817 0.801 0.805 0.779 0.771
Note: N=training sample size; r=diagnosis-test correlation
171
Table 2.17
Specificity of raw summed score in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.593 0.604 0.588 0.569 0.568 0.583 0.619 0.612 0.606 0.597 0.593 0.588
250 .5 0.667 0.667 0.643 0.653 0.642 0.647 0.691 0.693 0.671 0.659 0.652 0.646
250 .7 0.736 0.734 0.722 0.711 0.705 0.697 0.768 0.770 0.743 0.742 0.721 0.722
500 .3 0.598 0.597 0.590 0.595 0.583 0.581 0.616 0.614 0.608 0.598 0.590 0.592
500 .5 0.666 0.662 0.655 0.652 0.641 0.646 0.685 0.693 0.667 0.664 0.655 0.647
500 .7 0.732 0.729 0.717 0.718 0.704 0.700 0.773 0.763 0.737 0.743 0.723 0.719
1000 .3 0.601 0.588 0.580 0.584 0.582 0.579 0.606 0.607 0.601 0.598 0.593 0.586
1000 .5 0.663 0.665 0.646 0.646 0.640 0.640 0.682 0.681 0.662 0.669 0.652 0.646
1000 .7 0.735 0.734 0.718 0.720 0.704 0.702 0.760 0.761 0.740 0.740 0.721 0.721
Note: N=training sample size; r=diagnosis-test correlation
172
Table 2.18
Specificity of raw summed score in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.629 0.626 0.599 0.592 0.609 0.587 0.630 0.630 0.628 0.604 0.601 0.589
250 .5 0.685 0.686 0.660 0.666 0.652 0.660 0.707 0.706 0.677 0.685 0.661 0.651
250 .7 0.764 0.771 0.749 0.744 0.726 0.727 0.788 0.790 0.763 0.763 0.736 0.734
500 .3 0.608 0.595 0.606 0.604 0.587 0.585 0.618 0.623 0.601 0.608 0.589 0.590
500 .5 0.677 0.692 0.673 0.663 0.654 0.656 0.703 0.699 0.676 0.677 0.664 0.653
500 .7 0.764 0.767 0.743 0.746 0.728 0.722 0.783 0.780 0.755 0.760 0.738 0.737
1000 .3 0.612 0.605 0.591 0.595 0.591 0.598 0.622 0.613 0.609 0.603 0.592 0.589
1000 .5 0.680 0.677 0.666 0.675 0.652 0.658 0.699 0.692 0.670 0.674 0.659 0.657
1000 .7 0.761 0.764 0.738 0.745 0.732 0.732 0.776 0.781 0.754 0.748 0.733 0.740
Note: N=training sample size; r=diagnosis-test correlation
173
Table 3.1
Proportion of the CART models with a Bayes classifier that did not assign cases to the minority class in
conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.994 0.994 0.988 0.992 0.938 0.944 0.986 0.982 0.972 0.990 0.964 0.956
250 .5 0.982 0.980 0.964 0.964 0.874 0.850 0.980 0.956 0.964 0.962 0.836 0.870
250 .7 0.926 0.942 0.850 0.846 0.554 0.538 0.878 0.878 0.836 0.830 0.606 0.600
500 .3 0.992 0.996 0.508 0.484 0.068 0.080 0.988 0.994 0.002 0 0.010 0.004
500 .5 0.984 0.996 0.434 0.374 0.010 0.010 0.974 0.978 0 0 0 0
500 .7 0.940 0.952 0.796 0.796 0.392 0.346 0.862 0.906 0.778 0.742 0.384 0.380
1000 .3 0.986 0.982 0.448 0.456 0.046 0.050 0.994 0.994 0 0.002 0 0.004
1000 .5 0.984 0.974 0.362 0.342 0.008 0.010 0.960 0.972 0 0 0 0
1000 .7 0.944 0.932 0.688 0.726 0.210 0.242 0.868 0.874 0.628 0.612 0.130 0.180
Note: N=training sample size; r=diagnosis-test correlation
174
Table 3.2
Proportion of the CART models with a Bayes classifier that did not assign cases to the minority class in
conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.992 0.984 0.984 0.982 0.960 0.970 0.988 0.988 0.988 0.976 0.974 0.966
250 .5 0.978 0.964 0.954 0.960 0.900 0.880 0.964 0.962 0.970 0.960 0.884 0.868
250 .7 0.902 0.906 0.856 0.876 0.632 0.624 0.900 0.886 0.874 0.842 0.646 0.670
500 .3 0.992 0.996 0.004 0.002 0 0.006 0.978 0.988 0 0 0 0
500 .5 0.986 0.982 0 0.004 0 0 0.976 0.984 0 0 0 0
500 .7 0.916 0.934 0.778 0.822 0.368 0.388 0.888 0.892 0.746 0.760 0.412 0.418
1000 .3 0.988 0.988 0.004 0 0.004 0.004 0.984 0.99 0 0 0 0
1000 .5 0.982 0.988 0 0 0 0 0.974 0.978 0 0 0 0
1000 .7 0.910 0.912 0.736 0.756 0.178 0.184 0.856 0.876 0.704 0.720 0.212 0.160
Note: N=training sample size; r=diagnosis-test correlation
175
Table 3.3
Proportion of the Random Forest models with a Bayes classifier that did not assign cases to the
minority class in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.704 0.674 0.408 0.396 0.030 0.028 0.038 0.042 0.000 0.004 0.000 0.000
250 .5 0.718 0.740 0.344 0.334 0.020 0.012 0.010 0.014 0.000 0.000 0.000 0.000
250 .7 0.654 0.654 0.276 0.256 0.006 0.002 0.004 0.000 0.000 0.000 0.000 0.000
500 .3 0.852 0.848 0.610 0.560 0.068 0.074 0.072 0.078 0.000 0.000 0.000 0.000
500 .5 0.826 0.818 0.516 0.504 0.028 0.028 0.012 0.026 0.000 0.000 0.000 0.000
500 .7 0.790 0.806 0.312 0.352 0.008 0.006 0.002 0.004 0.000 0.000 0.000 0.000
1000 .3 0.936 0.940 0.728 0.756 0.122 0.170 0.072 0.060 0.000 0.000 0.000 0.000
1000 .5 0.942 0.928 0.636 0.634 0.078 0.070 0.004 0.006 0.000 0.000 0.000 0.000
1000 .7 0.856 0.874 0.412 0.450 0.006 0.010 0.002 0.000 0.000 0.000 0.000 0.000
Note: N=training sample size; r=diagnosis-test correlation
176
Table 3.4
Proportion of the Random Forest models with a Bayes classifier that did not assign cases to the
minority class in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.480 0.450 0.110 0.108 0.000 0.000 0.674 0.674 0.724 0.378 0.012 0.004
250 .5 0.228 0.256 0.014 0.030 0.000 0.000 0.326 0.334 0.082 0.074 0.000 0.000
250 .7 0.076 0.048 0.002 0.000 0.000 0.000 0.060 0.038 0.000 0.002 0.000 0.000
500 .3 0.610 0.626 0.114 0.128 0.000 0.000 0.756 0.710 0.352 0.366 0.006 0.002
500 .5 0.372 0.374 0.012 0.020 0.000 0.000 0.300 0.334 0.036 0.020 0.000 0.000
500 .7 0.090 0.104 0.000 0.000 0.000 0.000 0.026 0.038 0.000 0.000 0.000 0.000
1000 .3 0.710 0.744 0.146 0.158 0.000 0.000 0.730 0.752 0.336 0.368 0.000 0.004
1000 .5 0.396 0.394 0.018 0.018 0.000 0.000 0.248 0.254 0.012 0.008 0.000 0.000
1000 .7 0.064 0.096 0.000 0.000 0.000 0.000 0.002 0.010 0.000 0.000 0.000 0.000
Note: N=training sample size; r=diagnosis-test correlation
177
Table 3.5
Proportion of the Lasso models with a Bayes classifier that did not assign cases to the minority class in
conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 1 1 1 1 1 1 1 1 1 1 1 1
250 .5 1 1 1 1 0.998 0.996 1 1 1 1 0.978 0.972
250 .7 0.998 0.998 0.996 0.998 0.896 0.886 0.988 0.994 0.948 0.968 0.674 0.690
500 .3 1 1 1 1 1 1 1 1 1 1 1 1
500 .5 1 1 1 1 1 0.998 1 1 1 1 0.988 0.988
500 .7 1 1 0.998 0.998 0.832 0.790 0.986 0.994 0.968 0.956 0.348 0.360
1000 .3 1 1 1 1 1 1 1 1 1 1 1 1
1000 .5 1 1 1 1 0.998 1 1 1 1 1 0.976 0.984
1000 .7 0.998 1 1 0.998 0.622 0.674 0.998 0.992 0.884 0.906 0.102 0.082
Note: N=training sample size; r=diagnosis-test correlation
178
Table 3.6
Proportion of the Lasso models with a Bayes classifier that did not assign cases to the minority class in
conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 1 1 1 1 1 1 1 1 1 1 1 0.998
250 .5 1 1 1 1 0.990 0.998 1 1 0.998 0.998 0.970 0.978
250 .7 0.994 0.992 0.986 0.996 0.796 0.808 0.982 0.970 0.932 0.926 0.546 0.548
500 .3 1 1 1 1 1 1 1 1 1 1 1 1
500 .5 1 1 1 1 0.998 0.998 1 1 1 1 0.984 0.970
500 .7 0.998 1 0.984 0.994 0.596 0.600 0.984 0.984 0.876 0.888 0.232 0.242
1000 .3 1 1 1 1 1 1 1 1 1 1 1 1
1000 .5 1 1 1 1 0.994 0.994 1 1 1 1 0.950 0.958
1000 .7 0.998 0.998 0.980 0.978 0.262 0.300 0.972 0.968 0.744 0.776 0.026 0.018
Note: N=training sample size; r=diagnosis-test correlation
179
Table 3.7
Proportion of the Relaxed Lasso models with a Bayes classifier that did not assign cases to the minority
class in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 1 0.998 1 1 1 0.990 0.992 0.998 0.990 0.998 0.956 0.954
250 .5 0.996 0.994 0.978 0.986 0.874 0.840 0.956 0.966 0.898 0.874 0.534 0.586
250 .7 0.932 0.948 0.784 0.794 0.244 0.270 0.622 0.616 0.360 0.384 0.054 0.046
500 .3 1 1 1 1 1 0.996 0.998 1 1 1 0.960 0.964
500 .5 1 1 0.986 0.990 0.754 0.730 0.968 0.978 0.856 0.848 0.314 0.306
500 .7 0.962 0.962 0.736 0.752 0.088 0.090 0.552 0.564 0.126 0.178 0 0.002
1000 .3 1 1 1 1 1 0.998 1 1 1 1 0.974 0.960
1000 .5 1 1 0.996 0.992 0.670 0.718 0.978 0.984 0.852 0.826 0.140 0.152
1000 .7 0.970 0.972 0.684 0.726 0.030 0.034 0.450 0.422 0.018 0.038 0 0
Note: N=training sample size; r=diagnosis-test correlation
180
Table 3.8
Proportion of the Relaxed Lasso models with a Bayes classifier that did not assign cases to the minority
class in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.998 1 0.998 0.994 0.982 0.988 0.996 0.996 0.992 0.986 0.920 0.930
250 .5 0.972 0.984 0.928 0.946 0.678 0.674 0.908 0.906 0.800 0.754 0.488 0.436
250 .7 0.742 0.782 0.540 0.496 0.092 0.112 0.464 0.380 0.198 0.232 0.016 0.012
500 .3 1 1 1 1 0.986 0.986 1 0.998 0.994 0.996 0.924 0.912
500 .5 0.996 0.990 0.960 0.97 0.492 0.530 0.910 0.932 0.702 0.696 0.126 0.156
500 .7 0.768 0.784 0.332 0.326 0.006 0.002 0.314 0.326 0.034 0.040 0 0
1000 .3 1 1 1 1 0.998 0.996 1 1 1 1 0.912 0.940
1000 .5 1 0.996 0.978 0.978 0.320 0.362 0.928 0.958 0.690 0.684 0.034 0.040
1000 .7 0.768 0.802 0.178 0.226 0 0 0.146 0.172 0 0 0 0
Note: N=training sample size; r=diagnosis-test correlation
181
Table 3.9
Proportion of the Logistic Regression models with a Bayes classifier that did not assign cases to the
minority class in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.630 0.630 0.602 0.582 0.282 0.236 0.350 0.388 0.326 0.312 0.090 0.078
250 .5 0.522 0.582 0.446 0.418 0.094 0.106 0.186 0.216 0.128 0.162 0.010 0.006
250 .7 0.368 0.372 0.214 0.202 0.010 0.006 0.048 0.030 0.008 0.006 0 0
500 .3 0.920 0.946 0.922 0.918 0.628 0.646 0.836 0.812 0.734 0.714 0.234 0.208
500 .5 0.846 0.854 0.756 0.726 0.216 0.202 0.588 0.564 0.290 0.264 0.008 0.012
500 .7 0.682 0.712 0.362 0.370 0.020 0.016 0.144 0.142 0.026 0.014 0 0
1000 .3 0.998 0.996 0.994 0.994 0.914 0.896 0.980 0.994 0.946 0.950 0.480 0.484
1000 .5 0.982 0.988 0.930 0.934 0.326 0.386 0.822 0.846 0.484 0.496 0.004 0.010
1000 .7 0.860 0.862 0.508 0.500 0.010 0.014 0.202 0.236 0.002 0.016 0 0
Note: N=training sample size; r=diagnosis-test correlation
182
Table 3.10
Proportion of the Logistic Regression models with a Bayes classifier that did not assign cases to the
minority class in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0 0 0 0 0 0 0 0 0 0 0 0
250 .5 0 0 0 0 0 0 0 0 0 0 0 0
250 .7 0 0 0 0 0 0 0 0 0 0 0 0
500 .3 0.022 0.028 0.016 0.018 0 0.002 0.012 0 0.002 0.004 0 0
500 .5 0.008 0.004 0.004 0.002 0 0 0 0 0 0 0 0
500 .7 0.002 0 0 0 0 0 0 0 0 0 0 0
1000 .3 0.524 0.544 0.464 0.446 0.048 0.040 0.356 0.328 0.192 0.194 0.002 0.006
1000 .5 0.232 0.220 0.094 0.098 0 0 0.056 0.070 0.008 0 0 0
1000 .7 0.016 0.034 0 0 0 0 0 0.002 0 0 0 0
Note: N=training sample size; r=diagnosis-test correlation
183
Table 3.11
Proportion of the Lasso models with a ROC classifier that did not assign cases to the minority class in
conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.968 0.976 0.960 0.988 0.926 0.920 0.958 0.968 0.956 0.962 0.882 0.890
250 .5 0.812 0.838 0.778 0.764 0.530 0.514 0.792 0.772 0.694 0.672 0.422 0.454
250 .7 0.414 0.452 0.240 0.270 0.058 0.054 0.280 0.288 0.154 0.186 0.020 0.032
500 .3 0.976 0.982 0.970 0.962 0.890 0.858 0.966 0.970 0.960 0.968 0.832 0.802
500 .5 0.754 0.758 0.508 0.500 0.118 0.132 0.666 0.676 0.380 0.374 0.058 0.068
500 .7 0.250 0.202 0.016 0.014 0 0 0.136 0.120 0.012 0.002 0 0
1000 .3 0.972 0.968 0.924 0.938 0.654 0.698 0.954 0.960 0.912 0.908 0.582 0.608
1000 .5 0.516 0.506 0.140 0.114 0.002 0.004 0.410 0.414 0.072 0.078 0.004 0
1000 .7 0.022 0.040 0 0 0 0 0.012 0.010 0 0 0 0
Note: N=training sample size; r=diagnosis-test correlation
184
Table 3.12
Proportion of the Lasso models with a ROC classifier that did not assign cases to the minority class in
conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.962 0.946 0.962 0.960 0.894 0.878 0.960 0.960 0.960 0.962 0.854 0.886
250 .5 0.784 0.798 0.716 0.696 0.458 0.432 0.770 0.734 0.638 0.606 0.386 0.360
250 .7 0.348 0.354 0.204 0.200 0.030 0.050 0.290 0.226 0.116 0.136 0.010 0.006
500 .3 0.966 0.966 0.952 0.934 0.850 0.836 0.962 0.948 0.934 0.956 0.790 0.778
500 .5 0.696 0.712 0.368 0.418 0.066 0.082 0.624 0.664 0.342 0.292 0.028 0.046
500 .7 0.134 0.134 0.018 0.010 0 0 0.102 0.088 0.006 0.002 0 0
1000 .3 0.950 0.946 0.898 0.916 0.592 0.612 0.930 0.936 0.892 0.894 0.546 0.526
1000 .5 0.414 0.410 0.100 0.070 0 0.002 0.314 0.332 0.054 0.054 0.002 0
1000 .7 0.008 0.018 0 0 0 0 0.002 0.004 0 0 0 0
Note: N=training sample size; r=diagnosis-test correlation
185
Table 4.1
Classification accuracy of CART for models with prevalence of .20 and greater
than N=250
Classification rate
Sensitivity
Specificity
Number of Items
i r 2 5
2 5 2 5
10 .3 0.772 0.713 0.070 0.209 0.948 0.839
.5 0.776 0.734 0.163 0.286 0.929 0.847
.7 0.803 0.800 0.343 0.346 0.917 0.914
30 .3 0.712 0.692 0.206 0.241 0.839 0.804
.5 0.734 0.720 0.295 0.309 0.844 0.823
.7 0.800 0.802 0.366 0.362 0.908 0.912
Note: i=number of items; r=diagnosis-test correlation
186
Table 4.2
Classification rate of random forest with a Bayes classifier in conditions with prevalence of .20
Number of items
10 30
Number of item categories
2 5 2 5
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3
250 .3 0.790 0.789 0.782 0.783 0.789 0.791 0.795 0.795
250 .5 0.789 0.788 0.790 0.789 0.793 0.794 0.800 0.800
250 .7 0.806 0.806 0.814 0.814 0.820 0.820 0.828 0.828
500 .3 0.795 0.795 0.785 0.785 0.792 0.792 0.797 0.797
500 .5 0.794 0.794 0.792 0.791 0.797 0.797 0.802 0.802
500 .7 0.812 0.811 0.816 0.816 0.822 0.822 0.830 0.830
1000 .3 0.798 0.797 0.788 0.787 0.794 0.794 0.798 0.798
1000 .5 0.798 0.798 0.793 0.793 0.798 0.798 0.803 0.804
1000 .7 0.815 0.814 0.817 0.817 0.823 0.824 0.832 0.831
Note: N=training sample size; r=diagnosis-test correlation
187
Table 4.3
Sensitivity of random forest with a Bayes classifier in conditions with prevalence of .20
Number of items
10 30
Number of item categories
2 5 2 5
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3
250 .3 0.035 0.035 0.069 0.065 0.054 0.049 0.039 0.038
250 .5 0.114 0.117 0.163 0.161 0.156 0.155 0.143 0.143
250 .7 0.306 0.302 0.324 0.317 0.350 0.341 0.347 0.343
500 .3 0.019 0.018 0.061 0.061 0.039 0.038 0.030 0.029
500 .5 0.095 0.099 0.154 0.158 0.153 0.151 0.140 0.137
500 .7 0.299 0.297 0.328 0.325 0.348 0.348 0.344 0.337
1000 .3 0.008 0.009 0.054 0.054 0.033 0.032 0.024 0.023
1000 .5 0.083 0.078 0.152 0.151 0.151 0.146 0.132 0.129
1000 .7 0.296 0.292 0.328 0.326 0.351 0.350 0.343 0.342
Note: N=training sample size; r=diagnosis-test correlation
188
Table 4.4
Specificity of random forest with a Bayes classifier in conditions with prevalence of .20
Number of items
10 30
Number of item categories
2 5 2 5
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3
250 .3 0.978 0.977 0.961 0.963 0.973 0.975 0.984 0.984
250 .5 0.958 0.956 0.946 0.946 0.953 0.954 0.964 0.964
250 .7 0.931 0.932 0.937 0.938 0.937 0.940 0.949 0.950
500 .3 0.989 0.989 0.966 0.966 0.981 0.982 0.988 0.989
500 .5 0.969 0.968 0.951 0.950 0.957 0.958 0.967 0.968
500 .7 0.940 0.940 0.939 0.939 0.940 0.940 0.952 0.953
1000 .3 0.995 0.995 0.971 0.971 0.985 0.985 0.991 0.992
1000 .5 0.976 0.978 0.953 0.954 0.959 0.961 0.971 0.972
1000 .7 0.944 0.945 0.940 0.940 0.942 0.942 0.953 0.954
Note: N=training sample size; r=diagnosis-test correlation
189
Table 4.4A
Classification accuracy of lasso logistic regression with a Bayes classifier for
models with prevalence of .20, diagnosis-test correlation of .7, five-category
items, and sample size greater than N=250
Classification rate
Sensitivity
Specificity
Number of items
i ld 10 30
10 30 10 30
500 0 0.816 0.818 0.139 0.148 0.986 0.985
500 .3 0.815 0.818 0.130 0.145 0.986 0.986
1000 0 0.818 0.823 0.151 0.183 0.985 0.982
1000 .3 0.817 0.822 0.151 0.184 0.985 0.982
Note: i=number of items; ld=local dependence
Table 4.5
Classification rate of relaxed lasso logistic regression with a Bayes classifier for conditions with
five-category items and a diagnosis-test correlation of .7
Prevalence
0.05 0.05 1 1 2 2
Local Dependence
N i 0 3 0 3 0 3
250 10 0.943 0.943 0.897 0.897 0.817 0.817
250 30 0.942 0.942 0.897 0.897 0.820 0.820
500 10 0.948 0.948 0.901 0.901 0.823 0.822
500 30 0.947 0.947 0.901 0.901 0.826 0.826
1000 10 0.950 0.950 0.903 0.903 0.825 0.825
1000 30 0.950 0.950 0.904 0.904 0.830 0.829
Note: N=training sample size; i = number of items
190
Table 4.6
Sensitivity of relaxed lasso logistic regression with a Bayes classifier for conditions with
five-category items and a diagnosis-test correlation of .7
Prevalence
0.05 .10 .20
Local Dependence
N i 0 3 0 3 0 3
250 10 0.179 0.179 0.222 0.214 0.353 0.345
250 30 0.206 0.212 0.252 0.255 0.399 0.393
500 10 0.120 0.116 0.179 0.186 0.352 0.348
500 30 0.154 0.152 0.237 0.234 0.394 0.390
1000 10 0.088 0.089 0.168 0.172 0.348 0.346
1000 30 0.122 0.125 0.223 0.222 0.392 0.391
Note: N=training sample size; i = number of items
191
Table 4.7
Specificity of relaxed lasso logistic regression with a Bayes classifier for conditions with
five-category items and a diagnosis-test correlation of .7
Prevalence
0.05 .10 .20
Local Dependence
N i 0 3 0 3 0 3
250 10 0.953 0.954 0.958 0.959 0.920 0.920
250 30 0.944 0.947 0.947 0.947 0.901 0.902
500 10 0.994 0.994 0.992 0.992 0.972 0.972
500 30 0.990 0.989 0.984 0.984 0.942 0.942
1000 10 0.999 0.999 0.999 0.999 0.991 0.992
1000 30 0.999 0.999 0.996 0.996 0.961 0.963
Note: N=training sample size; i = number of items
192
Table 4.8
Classification Rate of logistic regression with a Bayes classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.909 0.909 0.868 0.868 0.759 0.759 0.908 0.908 0.906 0.866 0.758 0.758
250 .5 0.902 0.905 0.864 0.864 0.765 0.766 0.900 0.900 0.863 0.863 0.769 0.768
250 .7 0.898 0.898 0.868 0.867 0.795 0.795 0.894 0.896 0.868 0.866 0.800 0.800
500 .3 0.945 0.944 0.894 0.894 0.787 0.786 0.943 0.943 0.893 0.893 0.785 0.784
500 .5 0.942 0.941 0.890 0.891 0.789 0.788 0.939 0.939 0.888 0.887 0.789 0.789
500 .7 0.936 0.937 0.891 0.891 0.815 0.814 0.935 0.934 0.892 0.892 0.818 0.818
1000 .3 0.950 0.949 0.900 0.899 0.797 0.797 0.949 0.949 0.899 0.899 0.795 0.796
1000 .5 0.949 0.949 0.898 0.898 0.798 0.798 0.948 0.948 0.897 0.897 0.799 0.799
1000 .7 0.947 0.947 0.900 0.900 0.823 0.823 0.947 0.946 0.902 0.901 0.827 0.827
Note: N=training sample size; r=diagnosis-test correlation
193
Table 4.9
Sensitivity of logistic regression with a Bayes classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.060 0.059 0.054 0.054 0.115 0.113 0.066 0.066 0.068 0.065 0.123 0.124
250 .5 0.110 0.105 0.117 0.114 0.224 0.222 0.131 0.133 0.136 0.135 0.235 0.236
250 .7 0.220 0.228 0.263 0.269 0.406 0.402 0.272 0.267 0.289 0.292 0.415 0.410
500 .3 0.008 0.008 0.012 0.012 0.046 0.046 0.013 0.012 0.017 0.016 0.063 0.062
500 .5 0.024 0.025 0.048 0.048 0.173 0.170 0.046 0.045 0.072 0.080 0.198 0.193
500 .7 0.129 0.120 0.218 0.214 0.389 0.391 0.177 0.179 0.255 0.251 0.400 0.396
1000 .3 0.001 0.001 0.002 0.002 0.017 0.016 0.002 0.002 0.004 0.003 0.029 0.027
1000 .5 0.005 0.005 0.017 0.019 0.145 0.137 0.015 0.014 0.042 0.041 0.167 0.165
1000 .7 0.068 0.062 0.184 0.177 0.382 0.385 0.130 0.130 0.232 0.231 0.395 0.393
Note: N=training sample size; r=diagnosis-test correlation
194
Table 4.10
Specificity of logistic regression with a Bayes classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.953 0.954 0.958 0.959 0.920 0.920 0.952 0.952 0.951 0.955 0.917 0.917
250 .5 0.944 0.947 0.947 0.947 0.901 0.902 0.940 0.940 0.943 0.944 0.902 0.901
250 .7 0.934 0.933 0.935 0.933 0.892 0.894 0.927 0.929 0.932 0.930 0.897 0.897
500 .3 0.994 0.994 0.992 0.992 0.972 0.972 0.992 0.992 0.990 0.990 0.965 0.965
500 .5 0.990 0.989 0.984 0.984 0.942 0.942 0.986 0.987 0.979 0.977 0.937 0.937
500 .7 0.979 0.979 0.966 0.966 0.921 0.920 0.975 0.974 0.963 0.964 0.923 0.924
1000 .3 0.999 0.999 0.999 0.999 0.991 0.992 0.999 0.999 0.998 0.998 0.987 0.988
1000 .5 0.999 0.999 0.996 0.996 0.961 0.963 0.997 0.997 0.992 0.992 0.957 0.957
1000 .7 0.993 0.993 0.979 0.980 0.934 0.933 0.990 0.989 0.976 0.975 0.935 0.935
Note: N=training sample size; r=diagnosis-test correlation
195
Table 4.11
Classification rate of random forest with a ROC classifier in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.623 0.616 0.509 0.501 0.496 0.503 0.514 0.519 0.514 0.507 0.524 0.522
250 .5 0.660 0.648 0.570 0.573 0.600 0.599 0.600 0.608 0.607 0.601 0.612 0.610
250 .7 0.726 0.716 0.689 0.679 0.694 0.692 0.716 0.719 0.706 0.710 0.706 0.704
500 .3 0.737 0.720 0.573 0.566 0.504 0.497 0.507 0.505 0.508 0.505 0.526 0.529
500 .5 0.736 0.729 0.607 0.596 0.607 0.602 0.599 0.598 0.607 0.602 0.619 0.615
500 .7 0.773 0.771 0.692 0.694 0.701 0.699 0.718 0.716 0.712 0.711 0.709 0.707
1000 .3 0.813 0.807 0.679 0.672 0.515 0.515 0.492 0.501 0.506 0.501 0.522 0.521
1000 .5 0.801 0.796 0.676 0.662 0.611 0.603 0.598 0.599 0.612 0.611 0.617 0.616
1000 .7 0.818 0.815 0.722 0.716 0.709 0.707 0.719 0.722 0.710 0.714 0.712 0.710
Note: N=training sample size; r=diagnosis-test correlation
196
Table 4.12
Classification rate of random forest with a ROC classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.512 0.512 0.518 0.515 0.549 0.542 0.563 0.563 0.549 0.552 0.564 0.562
250 .5 0.594 0.597 0.614 0.612 0.625 0.626 0.660 0.667 0.654 0.658 0.652 0.654
250 .7 0.719 0.724 0.720 0.720 0.720 0.721 0.761 0.766 0.749 0.751 0.740 0.738
500 .3 0.500 0.503 0.507 0.513 0.539 0.541 0.544 0.553 0.555 0.555 0.568 0.562
500 .5 0.589 0.584 0.612 0.606 0.632 0.629 0.663 0.666 0.658 0.661 0.655 0.657
500 .7 0.721 0.717 0.723 0.722 0.723 0.723 0.769 0.768 0.753 0.753 0.742 0.742
1000 .3 0.502 0.502 0.509 0.506 0.536 0.534 0.563 0.557 0.562 0.569 0.571 0.572
1000 .5 0.586 0.582 0.608 0.602 0.632 0.632 0.670 0.673 0.662 0.666 0.661 0.661
1000 .7 0.717 0.718 0.722 0.722 0.726 0.724 0.767 0.769 0.756 0.754 0.742 0.744
Note: N=training sample size; r=diagnosis-test correlation
197
Table 4.13
Sensitivity of random forest with a ROC classifier in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.447 0.448 0.580 0.589 0.638 0.630 0.636 0.631 0.630 0.642 0.621 0.624
250 .5 0.527 0.537 0.650 0.642 0.654 0.647 0.710 0.707 0.688 0.697 0.670 0.672
250 .7 0.632 0.665 0.705 0.729 0.717 0.725 0.795 0.789 0.774 0.766 0.743 0.747
500 .3 0.304 0.324 0.489 0.498 0.621 0.629 0.649 0.654 0.646 0.654 0.629 0.624
500 .5 0.395 0.410 0.590 0.595 0.644 0.65 0.722 0.724 0.698 0.703 0.667 0.674
500 .7 0.517 0.513 0.700 0.693 0.712 0.712 0.801 0.801 0.774 0.774 0.747 0.750
1000 .3 0.205 0.208 0.338 0.353 0.583 0.594 0.673 0.663 0.655 0.660 0.642 0.642
1000 .5 0.273 0.282 0.459 0.481 0.628 0.639 0.727 0.730 0.697 0.699 0.675 0.679
1000 .7 0.393 0.393 0.627 0.632 0.701 0.700 0.805 0.803 0.783 0.777 0.746 0.750
Note: N=training sample size; r=diagnosis-test correlation
198
Table 4.14
Sensitivity of random forest with a ROC classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.629 0.620 0.624 0.627 0.595 0.605 0.615 0.615 0.628 0.621 0.605 0.609
250 .5 0.721 0.715 0.694 0.691 0.680 0.673 0.708 0.700 0.691 0.689 0.674 0.674
250 .7 0.805 0.799 0.781 0.782 0.758 0.760 0.810 0.808 0.786 0.784 0.767 0.766
500 .3 0.631 0.623 0.638 0.630 0.610 0.609 0.646 0.639 0.629 0.628 0.607 0.615
500 .5 0.717 0.715 0.702 0.701 0.673 0.676 0.708 0.707 0.696 0.696 0.681 0.678
500 .7 0.804 0.801 0.785 0.785 0.761 0.761 0.808 0.809 0.791 0.790 0.768 0.766
1000 .3 0.628 0.626 0.630 0.630 0.619 0.621 0.632 0.641 0.626 0.619 0.610 0.608
1000 .5 0.712 0.715 0.700 0.707 0.682 0.678 0.713 0.713 0.703 0.697 0.676 0.676
1000 .7 0.801 0.802 0.790 0.787 0.761 0.767 0.819 0.816 0.793 0.794 0.772 0.771
Note: N=training sample size; r=diagnosis-test correlation
199
Table 4.15
Specificity of random forest with a ROC classifier in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.633 0.624 0.501 0.491 0.461 0.471 0.507 0.513 0.501 0.492 0.500 0.497
250 .5 0.667 0.654 0.561 0.565 0.586 0.587 0.594 0.603 0.598 0.590 0.598 0.595
250 .7 0.731 0.718 0.687 0.673 0.689 0.684 0.712 0.715 0.699 0.704 0.696 0.693
500 .3 0.760 0.741 0.583 0.573 0.474 0.464 0.499 0.497 0.493 0.488 0.500 0.505
500 .5 0.754 0.745 0.609 0.596 0.597 0.590 0.593 0.591 0.597 0.590 0.607 0.600
500 .7 0.787 0.785 0.691 0.694 0.699 0.696 0.714 0.712 0.705 0.704 0.700 0.696
1000 .3 0.845 0.838 0.717 0.707 0.498 0.496 0.483 0.493 0.489 0.483 0.492 0.491
1000 .5 0.829 0.823 0.700 0.682 0.607 0.594 0.591 0.592 0.602 0.601 0.602 0.600
1000 .7 0.841 0.837 0.732 0.725 0.711 0.708 0.714 0.718 0.702 0.707 0.704 0.700
Note: N=training sample size; r=diagnosis-test correlation
200
Table 4.16
Specificity of random forest with a ROC classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.506 0.506 0.506 0.502 0.538 0.527 0.561 0.561 0.545 0.544 0.554 0.550
250 .5 0.588 0.590 0.605 0.604 0.611 0.615 0.658 0.665 0.650 0.654 0.647 0.649
250 .7 0.714 0.721 0.713 0.713 0.711 0.711 0.758 0.763 0.745 0.747 0.733 0.731
500 .3 0.493 0.496 0.492 0.500 0.522 0.524 0.539 0.549 0.547 0.547 0.558 0.549
500 .5 0.582 0.578 0.602 0.595 0.622 0.617 0.661 0.663 0.654 0.657 0.649 0.651
500 .7 0.716 0.712 0.716 0.715 0.714 0.713 0.767 0.766 0.748 0.749 0.735 0.735
1000 .3 0.495 0.495 0.496 0.493 0.516 0.512 0.560 0.552 0.555 0.563 0.561 0.563
1000 .5 0.579 0.575 0.598 0.590 0.619 0.620 0.668 0.671 0.657 0.662 0.657 0.657
1000 .7 0.712 0.714 0.715 0.715 0.717 0.713 0.765 0.766 0.752 0.750 0.735 0.737
Note: N=training sample size; r=diagnosis-test correlation
201
Table 4.17
Classification rate of logistic regression with a ROC classifier in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.557 0.556 0.566 0.557 0.567 0.558 0.570 0.572 0.576 0.569 0.576 0.579
250 .5 0.643 0.640 0.636 0.633 0.633 0.634 0.642 0.651 0.645 0.642 0.642 0.640
250 .7 0.722 0.723 0.714 0.711 0.708 0.710 0.737 0.739 0.723 0.723 0.721 0.719
500 .3 0.568 0.574 0.574 0.582 0.577 0.574 0.584 0.589 0.580 0.589 0.580 0.583
500 .5 0.655 0.654 0.647 0.643 0.644 0.644 0.667 0.669 0.657 0.656 0.652 0.653
500 .7 0.734 0.732 0.722 0.720 0.712 0.714 0.750 0.748 0.737 0.735 0.728 0.727
1000 .3 0.591 0.587 0.583 0.586 0.578 0.583 0.596 0.599 0.594 0.602 0.588 0.587
1000 .5 0.659 0.655 0.650 0.657 0.647 0.649 0.676 0.675 0.663 0.665 0.657 0.655
1000 .7 0.735 0.739 0.726 0.723 0.717 0.717 0.755 0.760 0.740 0.739 0.731 0.730
Note: N=training sample size; r=diagnosis-test correlation
202
Table 4.18
Classification rate of logistic regression with a ROC classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.550 0.533 0.526 0.517 0.554 0.545 0.574 0.574 0.579 0.560 0.565 0.563
250 .5 0.647 0.633 0.599 0.602 0.613 0.611 0.670 0.674 0.618 0.612 0.625 0.624
250 .7 0.767 0.769 0.698 0.692 0.696 0.697 0.808 0.808 0.703 0.697 0.702 0.703
500 .3 0.537 0.532 0.555 0.554 0.564 0.566 0.562 0.569 0.571 0.574 0.580 0.570
500 .5 0.623 0.625 0.635 0.629 0.639 0.637 0.643 0.647 0.641 0.639 0.642 0.640
500 .7 0.730 0.728 0.724 0.721 0.720 0.719 0.730 0.728 0.725 0.726 0.725 0.722
1000 .3 0.574 0.565 0.567 0.573 0.581 0.578 0.583 0.577 0.583 0.586 0.583 0.584
1000 .5 0.657 0.657 0.651 0.653 0.652 0.650 0.660 0.663 0.656 0.658 0.654 0.654
1000 .7 0.752 0.752 0.736 0.736 0.731 0.730 0.753 0.754 0.741 0.744 0.735 0.736
Note: N=training sample size; r=diagnosis-test correlation
203
Table 4.19
Sensitivity of logistic regression with a ROC classifier in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.617 0.612 0.601 0.605 0.590 0.602 0.603 0.601 0.590 0.596 0.582 0.575
250 .5 0.699 0.699 0.682 0.689 0.669 0.667 0.706 0.698 0.687 0.689 0.674 0.677
250 .7 0.788 0.789 0.767 0.774 0.749 0.747 0.794 0.792 0.783 0.784 0.756 0.759
500 .3 0.628 0.620 0.617 0.604 0.599 0.605 0.616 0.610 0.613 0.603 0.603 0.599
500 .5 0.713 0.712 0.698 0.700 0.675 0.675 0.716 0.710 0.701 0.702 0.680 0.679
500 .7 0.802 0.804 0.783 0.784 0.763 0.757 0.812 0.813 0.790 0.794 0.767 0.766
1000 .3 0.627 0.631 0.620 0.616 0.615 0.607 0.629 0.628 0.618 0.606 0.607 0.609
1000 .5 0.725 0.730 0.709 0.701 0.683 0.682 0.723 0.728 0.709 0.707 0.685 0.690
1000 .7 0.816 0.813 0.791 0.791 0.764 0.763 0.825 0.822 0.801 0.802 0.771 0.772
Note: N=training sample size; r=diagnosis-test correlation
204
Table 4.20
Sensitivity of logistic regression with a ROC classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.600 0.618 0.617 0.625 0.573 0.589 0.545 0.545 0.535 0.563 0.555 0.558
250 .5 0.644 0.667 0.695 0.685 0.658 0.664 0.572 0.561 0.656 0.664 0.643 0.644
250 .7 0.630 0.627 0.759 0.765 0.741 0.742 0.506 0.503 0.736 0.744 0.747 0.739
500 .3 0.632 0.637 0.607 0.607 0.590 0.587 0.591 0.593 0.585 0.578 0.567 0.581
500 .5 0.722 0.714 0.693 0.694 0.667 0.669 0.688 0.683 0.682 0.687 0.670 0.671
500 .7 0.797 0.791 0.779 0.778 0.757 0.759 0.794 0.795 0.783 0.784 0.760 0.763
1000 .3 0.624 0.630 0.622 0.612 0.593 0.598 0.609 0.616 0.601 0.597 0.594 0.590
1000 .5 0.719 0.719 0.706 0.702 0.681 0.682 0.715 0.717 0.704 0.700 0.681 0.682
1000 .7 0.804 0.808 0.797 0.796 0.769 0.772 0.818 0.816 0.802 0.798 0.775 0.773
Note: N=training sample size; r=diagnosis-test correlation
205
Table 4.21
Specificity of logistic regression with a ROC classifier in conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.554 0.553 0.562 0.552 0.562 0.547 0.568 0.570 0.575 0.566 0.575 0.580
250 .5 0.640 0.637 0.631 0.626 0.623 0.626 0.639 0.648 0.640 0.637 0.634 0.631
250 .7 0.719 0.719 0.708 0.704 0.698 0.701 0.734 0.736 0.716 0.717 0.713 0.709
500 .3 0.565 0.571 0.569 0.579 0.571 0.566 0.582 0.588 0.576 0.587 0.574 0.579
500 .5 0.652 0.651 0.642 0.637 0.636 0.636 0.664 0.667 0.652 0.650 0.646 0.646
500 .7 0.730 0.728 0.715 0.713 0.700 0.704 0.747 0.744 0.731 0.728 0.718 0.717
1000 .3 0.589 0.585 0.579 0.583 0.568 0.576 0.594 0.597 0.591 0.601 0.584 0.581
1000 .5 0.656 0.651 0.643 0.652 0.638 0.641 0.673 0.673 0.658 0.660 0.650 0.647
1000 .7 0.731 0.735 0.719 0.716 0.705 0.705 0.751 0.757 0.733 0.732 0.721 0.719
Note: N=training sample size; r=diagnosis-test correlation
206
Table 4.22
Specificity of logistic regression with a ROC classifier in conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.547 0.528 0.516 0.505 0.550 0.534 0.575 0.575 0.582 0.559 0.568 0.564
250 .5 0.647 0.631 0.589 0.593 0.602 0.597 0.675 0.680 0.614 0.607 0.620 0.619
250 .7 0.774 0.776 0.692 0.684 0.685 0.686 0.823 0.824 0.699 0.692 0.690 0.694
500 .3 0.532 0.526 0.549 0.548 0.557 0.560 0.560 0.568 0.569 0.574 0.583 0.567
500 .5 0.618 0.620 0.628 0.622 0.633 0.629 0.640 0.645 0.637 0.633 0.635 0.633
500 .7 0.726 0.725 0.717 0.715 0.711 0.709 0.727 0.725 0.719 0.719 0.716 0.712
1000 .3 0.572 0.562 0.561 0.568 0.578 0.573 0.582 0.575 0.581 0.585 0.580 0.583
1000 .5 0.653 0.654 0.644 0.647 0.645 0.641 0.658 0.660 0.651 0.653 0.648 0.647
1000 .7 0.750 0.749 0.729 0.729 0.722 0.720 0.750 0.750 0.734 0.738 0.725 0.727
Note: N=training sample size; r=diagnosis-test correlation
207
Table 4.23
Classification rate of lasso logistic regression with a ROC classifier in conditions with diagnosis-test
correlation of .70
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 0.716 0.708 0.704 0.694 0.699 0.695 0.727 0.724 0.714 0.710 0.709 0.707
250 30 0.725 0.729 0.715 0.710 0.711 0.710 0.733 0.735 0.724 0.721 0.719 0.717
500 10 0.719 0.715 0.710 0.707 0.708 0.707 0.732 0.729 0.722 0.722 0.721 0.719
500 30 0.734 0.734 0.726 0.727 0.722 0.724 0.744 0.743 0.737 0.731 0.730 0.730
1000 10 0.726 0.728 0.719 0.717 0.714 0.713 0.742 0.744 0.733 0.735 0.724 0.726
1000 30 0.746 0.746 0.738 0.738 0.732 0.731 0.755 0.754 0.745 0.746 0.736 0.736
Note: N=training sample size; i=number of items
208
Table 4.24
Sensitivity of logistic regression with a ROC classifier in conditions with diagnosis-test correlation of
.70
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 0.770 0.773 0.745 0.763 0.733 0.739 0.786 0.789 0.766 0.768 0.750 0.750
250 30 0.784 0.782 0.764 0.773 0.744 0.751 0.798 0.798 0.775 0.777 0.758 0.759
500 10 0.779 0.782 0.766 0.769 0.748 0.750 0.799 0.802 0.782 0.781 0.760 0.759
500 30 0.801 0.794 0.786 0.782 0.763 0.761 0.809 0.808 0.790 0.794 0.768 0.768
1000 10 0.792 0.796 0.778 0.778 0.755 0.754 0.811 0.813 0.791 0.788 0.768 0.764
1000 30 0.812 0.813 0.795 0.794 0.766 0.770 0.824 0.824 0.803 0.799 0.776 0.775
Note: N=training sample size; i=number of items
209
Table 4.25
Specificity of lasso logistic regression with a ROC classifier in conditions with diagnosis-test correlation
of .70
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 0.713 0.705 0.699 0.686 0.691 0.683 0.724 0.721 0.708 0.704 0.698 0.696
250 30 0.722 0.726 0.710 0.703 0.703 0.700 0.729 0.732 0.718 0.715 0.710 0.707
500 10 0.716 0.712 0.704 0.700 0.698 0.696 0.729 0.726 0.715 0.715 0.712 0.710
500 30 0.730 0.731 0.719 0.721 0.712 0.714 0.740 0.740 0.731 0.724 0.721 0.720
1000 10 0.723 0.724 0.713 0.710 0.703 0.703 0.738 0.740 0.726 0.729 0.713 0.717
1000 30 0.743 0.742 0.732 0.732 0.724 0.722 0.752 0.750 0.738 0.740 0.727 0.727
Note: N=training sample size; i=number of items
210
Table 4.26
Classification rate of relaxed lasso with a ROC classifier in conditions with diagnosis-test correlation of
.70
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 0.720 0.713 0.703 0.702 0.705 0.699 0.734 0.730 0.721 0.718 0.715 0.717
250 30 0.732 0.736 0.721 0.721 0.717 0.717 0.740 0.742 0.731 0.728 0.725 0.724
500 10 0.723 0.720 0.713 0.710 0.711 0.711 0.741 0.740 0.729 0.730 0.727 0.724
500 30 0.740 0.741 0.732 0.731 0.727 0.728 0.753 0.750 0.743 0.740 0.734 0.735
1000 10 0.731 0.733 0.724 0.720 0.716 0.714 0.750 0.754 0.739 0.738 0.730 0.729
1000 30 0.753 0.753 0.741 0.742 0.734 0.732 0.764 0.764 0.749 0.749 0.739 0.739
Note: N=training sample size; i=number of items
211
Table 4.27
Sensitivity of relaxed lasso with a ROC classifier in conditions with diagnosis-test correlation of .70
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 0.768 0.770 0.749 0.759 0.732 0.740 0.787 0.790 0.771 0.772 0.754 0.749
250 30 0.777 0.774 0.765 0.765 0.748 0.750 0.799 0.796 0.780 0.781 0.765 0.762
500 10 0.780 0.783 0.771 0.774 0.754 0.752 0.801 0.803 0.788 0.785 0.761 0.764
500 30 0.799 0.794 0.788 0.787 0.765 0.763 0.809 0.813 0.794 0.796 0.771 0.769
1000 10 0.795 0.797 0.782 0.784 0.759 0.761 0.816 0.815 0.794 0.795 0.767 0.769
1000 30 0.813 0.814 0.799 0.797 0.770 0.775 0.825 0.824 0.805 0.802 0.778 0.777
Note: N=training sample size; i=number of items
212
Table 4.28
Specificity of relaxed lasso with a ROC classifier in conditions with diagnosis-test correlation of .70
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 0.717 0.710 0.698 0.695 0.698 0.688 0.731 0.727 0.715 0.712 0.705 0.708
250 30 0.729 0.734 0.716 0.716 0.709 0.709 0.737 0.740 0.726 0.723 0.714 0.715
500 10 0.720 0.717 0.707 0.703 0.700 0.701 0.738 0.736 0.722 0.724 0.719 0.714
500 30 0.737 0.738 0.726 0.725 0.717 0.719 0.750 0.746 0.738 0.733 0.725 0.726
1000 10 0.728 0.730 0.717 0.713 0.705 0.702 0.746 0.750 0.733 0.732 0.721 0.718
1000 30 0.750 0.750 0.734 0.736 0.724 0.721 0.761 0.761 0.743 0.743 0.729 0.729
Note: N=training sample size; i=number of items
213
Table 6.1
Classification rate difference between random forest with ROC classifier and data generating theta in conditions
with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.003 -0.012 -0.108 -0.106 -0.100 -0.096 -0.121 -0.117 -0.100 -0.101 -0.084 -0.076
250 .5 -0.052 -0.065 -0.109 -0.116 -0.075 -0.070 -0.106 -0.103 -0.074 -0.083 -0.056 -0.055
250 .7 -0.072 -0.081 -0.084 -0.093 -0.060 -0.059 -0.083 -0.077 -0.065 -0.062 -0.042 -0.045
500 .3 0.116 0.093 -0.032 -0.049 -0.094 -0.098 -0.121 -0.118 -0.101 -0.105 -0.079 -0.068
500 .5 0.032 0.022 -0.075 -0.091 -0.060 -0.071 -0.110 -0.112 -0.079 -0.088 -0.052 -0.051
500 .7 -0.026 -0.028 -0.074 -0.074 -0.051 -0.051 -0.079 -0.078 -0.055 -0.055 -0.043 -0.044
1000 .3 0.182 0.186 0.073 0.073 -0.084 -0.089 -0.124 -0.120 -0.101 -0.107 -0.078 -0.075
1000 .5 0.097 0.094 -0.010 -0.026 -0.060 -0.067 -0.095 -0.099 -0.074 -0.071 -0.051 -0.054
1000 .7 0.025 0.024 -0.048 -0.053 -0.043 -0.043 -0.070 -0.070 -0.055 -0.055 -0.040 -0.040
Note: N=training sample size; r=diagnosis-test correlation
214
Table 6.2
Classification rate difference between random forest with ROC classifier and data generating theta in conditions
with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 -0.129 -0.114 -0.091 -0.097 -0.055 -0.057 -0.065 -0.065 -0.077 -0.061 -0.034 -0.040
250 .5 -0.111 -0.108 -0.075 -0.080 -0.043 -0.041 -0.057 -0.049 -0.027 -0.035 -0.015 -0.011
250 .7 -0.077 -0.073 -0.047 -0.050 -0.026 -0.029 -0.033 -0.036 -0.025 -0.022 -0.008 -0.015
500 .3 -0.128 -0.122 -0.109 -0.105 -0.059 -0.055 -0.078 -0.071 -0.057 -0.061 -0.030 -0.035
500 .5 -0.113 -0.124 -0.069 -0.077 -0.039 -0.042 -0.038 -0.041 -0.027 -0.027 -0.018 -0.009
500 .7 -0.069 -0.074 -0.046 -0.050 -0.026 -0.026 -0.027 -0.026 -0.016 -0.017 -0.009 -0.010
1000 .3 -0.128 -0.117 -0.098 -0.102 -0.066 -0.074 -0.061 -0.070 -0.051 -0.039 -0.027 -0.023
1000 .5 -0.116 -0.119 -0.073 -0.084 -0.040 -0.041 -0.039 -0.024 -0.016 -0.018 -0.009 -0.009
1000 .7 -0.071 -0.067 -0.043 -0.044 -0.025 -0.025 -0.022 -0.024 -0.014 -0.014 -0.010 -0.009
Note: N=training sample size; r=diagnosis-test correlation
215
Table 6.3
Sensitivity difference between random forest with ROC classifier and data generating theta in conditions with 10
items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 -0.163 -0.156 -0.017 -0.019 0.037 0.035 0.034 0.032 0.031 0.036 0.037 0.029
250 .5 -0.168 -0.156 -0.053 -0.045 -0.015 -0.034 0.011 0.006 -0.010 -0.001 -0.012 -0.015
250 .7 -0.165 -0.136 -0.078 -0.059 -0.048 -0.047 -0.008 -0.012 -0.015 -0.018 -0.030 -0.025
500 .3 -0.317 -0.288 -0.130 -0.109 0.014 0.019 0.038 0.040 0.035 0.042 0.035 0.017
500 .5 -0.323 -0.303 -0.117 -0.103 -0.046 -0.032 0.012 0.015 -0.002 0.006 -0.020 -0.017
500 .7 -0.293 -0.292 -0.106 -0.109 -0.064 -0.064 -0.006 -0.014 -0.028 -0.028 -0.029 -0.030
1000 .3 -0.409 -0.418 -0.281 -0.278 -0.025 -0.005 0.042 0.037 0.035 0.042 0.035 0.032
1000 .5 -0.447 -0.440 -0.248 -0.224 -0.059 -0.052 -0.006 0.002 -0.010 -0.012 -0.019 -0.013
1000 .7 -0.426 -0.431 -0.177 -0.171 -0.082 -0.084 -0.021 -0.022 -0.027 -0.028 -0.033 -0.034
Note: N=training sample size; r=diagnosis-test correlation
216
Table 6.4
Sensitivity difference between random forest with ROC classifier and data generating theta in conditions with 10
items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.039 0.016 0.019 0.025 0.003 0.006 0.013 0.013 0.023 0.021 0.006 0.015
250 .5 0.013 0.015 0.003 0.008 -0.002 -0.011 0.018 0.010 -0.008 0.005 -0.013 -0.016
250 .7 0.007 0.003 -0.011 -0.006 -0.017 -0.012 0.009 0.015 0.006 0 -0.009 -0.001
500 .3 0.021 0.007 0.035 0.028 0.007 0 0.027 0.021 0.021 0.025 0.004 0.009
500 .5 -0.002 0.005 -0.006 -0.004 -0.011 -0.007 -0.005 -0.004 -0.003 -0.006 -0.001 -0.017
500 .7 -0.016 -0.012 -0.015 -0.010 -0.020 -0.018 -0.005 -0.005 -0.007 -0.005 -0.011 -0.008
1000 .3 0.014 -0.004 0.011 0.013 0.017 0.028 0.012 0.020 0.013 0.002 0.001 -0.005
1000 .5 -0.012 -0.011 -0.012 0 -0.006 -0.009 0.001 -0.018 -0.015 -0.013 -0.014 -0.014
1000 .7 -0.026 -0.032 -0.019 -0.022 -0.020 -0.018 -0.007 -0.004 -0.008 -0.009 -0.008 -0.011
Note: N=training sample size; r=diagnosis-test correlation
217
Table 6.5
Specificity difference between random forest with ROC classifier and data generating theta in conditions with 10
items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.011 -0.004 -0.118 -0.116 -0.134 -0.129 -0.129 -0.125 -0.114 -0.116 -0.114 -0.103
250 .5 -0.045 -0.060 -0.115 -0.124 -0.090 -0.079 -0.113 -0.108 -0.081 -0.092 -0.067 -0.066
250 .7 -0.067 -0.079 -0.085 -0.096 -0.063 -0.062 -0.087 -0.080 -0.070 -0.067 -0.045 -0.050
500 .3 0.138 0.113 -0.021 -0.042 -0.121 -0.127 -0.130 -0.126 -0.116 -0.122 -0.107 -0.089
500 .5 0.051 0.039 -0.070 -0.090 -0.064 -0.081 -0.117 -0.119 -0.087 -0.098 -0.060 -0.060
500 .7 -0.012 -0.014 -0.070 -0.070 -0.047 -0.048 -0.083 -0.081 -0.058 -0.058 -0.046 -0.047
1000 .3 0.213 0.217 0.112 0.112 -0.099 -0.110 -0.133 -0.128 -0.117 -0.124 -0.106 -0.102
1000 .5 0.125 0.122 0.017 -0.004 -0.061 -0.071 -0.100 -0.105 -0.081 -0.078 -0.058 -0.065
1000 .7 0.048 0.048 -0.033 -0.039 -0.033 -0.032 -0.073 -0.072 -0.058 -0.058 -0.042 -0.042
Note: N=training sample size; r=diagnosis-test correlation
218
Table 6.6
Specificity difference between random forest with ROC classifier and data generating theta in conditions with 10
items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 -0.137 -0.121 -0.103 -0.110 -0.069 -0.072 -0.069 -0.069 -0.083 -0.071 -0.044 -0.053
250 .5 -0.117 -0.114 -0.083 -0.090 -0.054 -0.049 -0.061 -0.052 -0.029 -0.039 -0.015 -0.010
250 .7 -0.082 -0.077 -0.051 -0.054 -0.028 -0.033 -0.036 -0.038 -0.028 -0.024 -0.008 -0.018
500 .3 -0.136 -0.128 -0.125 -0.120 -0.075 -0.069 -0.084 -0.076 -0.066 -0.070 -0.038 -0.046
500 .5 -0.119 -0.130 -0.076 -0.086 -0.045 -0.051 -0.040 -0.043 -0.030 -0.029 -0.022 -0.007
500 .7 -0.072 -0.077 -0.049 -0.054 -0.028 -0.029 -0.028 -0.027 -0.017 -0.019 -0.008 -0.011
1000 .3 -0.136 -0.122 -0.110 -0.115 -0.087 -0.099 -0.065 -0.075 -0.058 -0.044 -0.034 -0.027
1000 .5 -0.121 -0.124 -0.080 -0.093 -0.049 -0.049 -0.041 -0.024 -0.017 -0.018 -0.007 -0.007
1000 .7 -0.073 -0.069 -0.045 -0.047 -0.027 -0.027 -0.023 -0.025 -0.014 -0.014 -0.010 -0.008
Note: N=training sample size; r=diagnosis-test correlation
219
Table 6.7
Classification rate difference between logistic regression with ROC classifier and data generating theta in
conditions with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 -0.064 -0.070 -0.050 -0.050 -0.029 -0.041 -0.064 -0.064 -0.037 -0.039 -0.031 -0.020
250 .5 -0.067 -0.072 -0.043 -0.057 -0.042 -0.035 -0.064 -0.060 -0.036 -0.041 -0.027 -0.026
250 .7 -0.076 -0.074 -0.059 -0.060 -0.046 -0.041 -0.062 -0.057 -0.048 -0.048 -0.026 -0.030
500 .3 -0.053 -0.056 -0.031 -0.032 -0.021 -0.021 -0.044 -0.034 -0.030 -0.021 -0.024 -0.013
500 .5 -0.049 -0.052 -0.035 -0.044 -0.023 -0.029 -0.042 -0.041 -0.029 -0.034 -0.018 -0.013
500 .7 -0.065 -0.067 -0.044 -0.048 -0.040 -0.036 -0.047 -0.046 -0.030 -0.031 -0.024 -0.023
1000 .3 -0.040 -0.032 -0.022 -0.012 -0.022 -0.021 -0.020 -0.022 -0.014 -0.006 -0.011 -0.009
1000 .5 -0.046 -0.046 -0.036 -0.032 -0.025 -0.022 -0.018 -0.023 -0.023 -0.017 -0.010 -0.015
1000 .7 -0.058 -0.052 -0.043 -0.045 -0.035 -0.032 -0.035 -0.032 -0.025 -0.030 -0.021 -0.020
Note: N=training sample size; r=diagnosis-test correlation
220
Table 6.8
Classification rate difference between logistic regression with ROC classifier and data generating theta in
conditions with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 -0.091 -0.094 -0.082 -0.095 -0.049 -0.054 -0.056 -0.056 -0.048 -0.054 -0.034 -0.038
250 .5 -0.058 -0.072 -0.089 -0.091 -0.055 -0.057 -0.047 -0.042 -0.063 -0.080 -0.043 -0.041
250 .7 -0.030 -0.028 -0.069 -0.077 -0.050 -0.052 0.013 0.007 -0.071 -0.075 -0.047 -0.050
500 .3 -0.091 -0.092 -0.061 -0.063 -0.035 -0.030 -0.061 -0.055 -0.042 -0.042 -0.017 -0.027
500 .5 -0.079 -0.083 -0.046 -0.054 -0.031 -0.034 -0.058 -0.060 -0.044 -0.049 -0.031 -0.025
500 .7 -0.060 -0.062 -0.045 -0.051 -0.029 -0.030 -0.066 -0.065 -0.043 -0.045 -0.026 -0.030
1000 .3 -0.056 -0.053 -0.040 -0.036 -0.021 -0.030 -0.042 -0.050 -0.030 -0.022 -0.015 -0.011
1000 .5 -0.045 -0.044 -0.031 -0.033 -0.020 -0.023 -0.049 -0.034 -0.022 -0.026 -0.015 -0.016
1000 .7 -0.035 -0.034 -0.029 -0.030 -0.020 -0.019 -0.037 -0.039 -0.029 -0.024 -0.017 -0.016
Note: N=training sample size; r=diagnosis-test correlation
221
Table 6.9
Sensitivity difference between logistic regression with ROC classifier and data generating theta in conditions
with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.007 0.007 0.004 -0.002 -0.010 0.007 0.001 0.003 -0.010 -0.010 -0.002 -0.020
250 .5 0.003 0.005 -0.021 0.001 0.001 -0.014 0.007 -0.003 -0.011 -0.009 -0.009 -0.010
250 .7 -0.009 -0.012 -0.017 -0.014 -0.016 -0.026 -0.008 -0.008 -0.006 0 -0.017 -0.014
500 .3 0.007 0.012 -0.001 -0.002 -0.009 -0.005 0.005 -0.004 0.003 -0.009 0.009 -0.008
500 .5 -0.005 -0.002 -0.008 0.003 -0.015 -0.007 0.006 0.001 0.002 0.005 -0.006 -0.012
500 .7 -0.009 -0.002 -0.022 -0.017 -0.013 -0.019 0.005 -0.002 -0.011 -0.008 -0.009 -0.014
1000 .3 0.014 0.003 0 -0.015 0.008 0.008 -0.003 0.002 -0.001 -0.012 0 -0.002
1000 .5 0.007 0.007 0.002 -0.003 -0.004 -0.009 -0.010 -0.001 0.003 -0.004 -0.009 -0.002
1000 .7 -0.003 -0.010 -0.013 -0.012 -0.018 -0.020 -0.001 -0.004 -0.008 -0.004 -0.008 -0.012
Note: N=training sample size; r=diagnosis-test correlation
222
Table 6.10
Sensitivity difference between logistic regression with ROC classifier and data generating theta in conditions
with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.010 0.015 0.012 0.023 -0.019 -0.010 -0.057 -0.057 -0.069 -0.037 -0.044 -0.036
250 .5 -0.064 -0.034 0.004 0.003 -0.024 -0.020 -0.118 -0.129 -0.044 -0.020 -0.043 -0.046
250 .7 -0.168 -0.170 -0.033 -0.022 -0.034 -0.030 -0.295 -0.290 -0.044 -0.040 -0.030 -0.028
500 .3 0.021 0.021 0.003 0.004 -0.013 -0.022 -0.027 -0.025 -0.023 -0.026 -0.036 -0.024
500 .5 0.004 0.005 -0.015 -0.011 -0.018 -0.015 -0.025 -0.029 -0.017 -0.014 -0.012 -0.024
500 .7 -0.023 -0.022 -0.021 -0.018 -0.023 -0.020 -0.019 -0.019 -0.015 -0.011 -0.019 -0.011
1000 .3 0.009 -0.001 0.002 -0.005 -0.009 0.005 -0.011 -0.005 -0.011 -0.019 -0.016 -0.024
1000 .5 -0.005 -0.008 -0.006 -0.004 -0.008 -0.005 0.004 -0.013 -0.014 -0.010 -0.010 -0.009
1000 .7 -0.023 -0.026 -0.012 -0.013 -0.012 -0.013 -0.008 -0.005 0.001 -0.006 -0.005 -0.009
Note: N=training sample size; r=diagnosis-test correlation
223
Table 6.11
Specificity difference between logistic regression with ROC classifier and data generating theta in conditions
with 10 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 -0.067 -0.074 -0.056 -0.055 -0.033 -0.053 -0.068 -0.067 -0.040 -0.042 -0.039 -0.020
250 .5 -0.071 -0.076 -0.046 -0.063 -0.053 -0.041 -0.068 -0.063 -0.038 -0.045 -0.031 -0.030
250 .7 -0.079 -0.078 -0.063 -0.065 -0.054 -0.045 -0.065 -0.059 -0.053 -0.054 -0.029 -0.034
500 .3 -0.056 -0.059 -0.035 -0.036 -0.024 -0.025 -0.047 -0.036 -0.034 -0.023 -0.032 -0.014
500 .5 -0.052 -0.054 -0.038 -0.049 -0.025 -0.035 -0.045 -0.043 -0.033 -0.038 -0.021 -0.014
500 .7 -0.068 -0.070 -0.046 -0.051 -0.046 -0.040 -0.050 -0.048 -0.032 -0.034 -0.028 -0.026
1000 .3 -0.043 -0.034 -0.025 -0.012 -0.029 -0.029 -0.021 -0.023 -0.015 -0.006 -0.014 -0.011
1000 .5 -0.049 -0.049 -0.041 -0.035 -0.030 -0.025 -0.018 -0.024 -0.026 -0.019 -0.011 -0.018
1000 .7 -0.061 -0.054 -0.046 -0.049 -0.039 -0.036 -0.036 -0.034 -0.027 -0.033 -0.024 -0.022
Note: N=training sample size; r=diagnosis-test correlation
224
Table 6.12
Specificity difference between logistic regression with ROC classifier and data generating theta in conditions
with 30 items
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 -0.097 -0.100 -0.093 -0.108 -0.057 -0.065 -0.056 -0.056 -0.047 -0.055 -0.031 -0.039
250 .5 -0.058 -0.074 -0.099 -0.101 -0.062 -0.066 -0.044 -0.037 -0.065 -0.087 -0.043 -0.039
250 .7 -0.023 -0.021 -0.073 -0.083 -0.054 -0.058 0.029 0.022 -0.074 -0.079 -0.051 -0.055
500 .3 -0.096 -0.098 -0.068 -0.071 -0.040 -0.032 -0.062 -0.057 -0.044 -0.043 -0.013 -0.028
500 .5 -0.084 -0.088 -0.050 -0.058 -0.035 -0.039 -0.060 -0.062 -0.047 -0.053 -0.036 -0.026
500 .7 -0.062 -0.064 -0.047 -0.054 -0.031 -0.033 -0.068 -0.068 -0.046 -0.049 -0.027 -0.035
1000 .3 -0.059 -0.056 -0.045 -0.039 -0.024 -0.038 -0.043 -0.052 -0.032 -0.022 -0.016 -0.007
1000 .5 -0.047 -0.046 -0.034 -0.036 -0.023 -0.028 -0.051 -0.035 -0.023 -0.027 -0.017 -0.017
1000 .7 -0.036 -0.034 -0.031 -0.032 -0.022 -0.020 -0.038 -0.040 -0.032 -0.026 -0.020 -0.018
Note: N=training sample size; r=diagnosis-test correlation
225
Table 6.13
Classification rate difference between lasso logistic regression with a ROC classifier and data generating theta in
conditions with diagnosis-test correlation of .70.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 -0.090 -0.101 -0.074 -0.084 -0.056 -0.057 -0.079 -0.083 -0.061 -0.066 -0.040 -0.043
250 30 -0.082 -0.080 -0.056 -0.065 -0.036 -0.039 -0.074 -0.072 -0.056 -0.056 -0.030 -0.036
500 10 -0.084 -0.088 -0.056 -0.061 -0.044 -0.043 -0.072 -0.067 -0.045 -0.045 -0.030 -0.031
500 30 -0.059 -0.059 -0.043 -0.045 -0.027 -0.026 -0.054 -0.054 -0.032 -0.039 -0.020 -0.022
1000 10 -0.068 -0.064 -0.050 -0.052 -0.038 -0.036 -0.048 -0.048 -0.032 -0.034 -0.028 -0.024
1000 30 -0.042 -0.040 -0.027 -0.028 -0.019 -0.017 -0.034 -0.039 -0.025 -0.022 -0.016 -0.016
Note: N=training sample size; i=number of items
226
Table 6.14
Sensitivity difference between lasso logistic regression with a ROC classifier and data generating theta in conditions with
diagnosis-test correlation of .70.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 -0.017 -0.014 -0.032 -0.016 -0.031 -0.033 -0.007 0.001 -0.019 -0.012 -0.022 -0.022
250 30 0.001 0.002 -0.025 -0.010 -0.030 -0.022 0.012 0.013 0.003 -0.003 -0.017 -0.008
500 10 -0.026 -0.018 -0.040 -0.033 -0.028 -0.026 -0.002 -0.010 -0.019 -0.021 -0.016 -0.021
500 30 -0.016 -0.017 -0.014 -0.013 -0.017 -0.018 -0.002 -0.003 -0.008 -0.001 -0.011 -0.007
1000 10 -0.025 -0.027 -0.026 -0.025 -0.028 -0.029 -0.014 -0.011 -0.018 -0.018 -0.011 -0.020
1000 30 -0.015 -0.021 -0.014 -0.015 -0.015 -0.016 -0.002 0.004 0.002 -0.004 -0.005 -0.006
Note: N=training sample size; i=number of items
227
Table 6.15
Specificity difference between lasso logistic regression with a ROC classifier and data generating theta in conditions with
diagnosis-test correlation of .70.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 -0.094 -0.106 -0.079 -0.091 -0.062 -0.063 -0.083 -0.088 -0.066 -0.072 -0.044 -0.048
250 30 -0.086 -0.084 -0.059 -0.070 -0.037 -0.043 -0.078 -0.076 -0.062 -0.061 -0.033 -0.043
500 10 -0.087 -0.091 -0.058 -0.064 -0.048 -0.048 -0.076 -0.071 -0.047 -0.047 -0.034 -0.034
500 30 -0.061 -0.061 -0.046 -0.048 -0.030 -0.027 -0.057 -0.057 -0.034 -0.044 -0.022 -0.026
1000 10 -0.070 -0.066 -0.052 -0.055 -0.040 -0.038 -0.049 -0.050 -0.034 -0.036 -0.032 -0.025
1000 30 -0.043 -0.041 -0.028 -0.030 -0.019 -0.018 -0.036 -0.041 -0.028 -0.024 -0.018 -0.018
Note: N=training sample size; i=number of items
228
Table 6.16
Classification rate difference between relaxed lasso with a ROC classifier and data generating theta in conditions with
diagnosis-test correlation of .70.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 -0.086 -0.097 -0.075 -0.076 -0.051 -0.053 -0.072 -0.077 -0.054 -0.058 -0.033 -0.033
250 30 -0.075 -0.073 -0.050 -0.053 -0.030 -0.032 -0.067 -0.065 -0.049 -0.048 -0.024 -0.028
500 10 -0.081 -0.083 -0.053 -0.058 -0.041 -0.040 -0.063 -0.057 -0.038 -0.036 -0.024 -0.026
500 30 -0.053 -0.052 -0.037 -0.041 -0.023 -0.021 -0.045 -0.047 -0.025 -0.031 -0.016 -0.017
1000 10 -0.063 -0.058 -0.046 -0.049 -0.036 -0.035 -0.040 -0.038 -0.026 -0.031 -0.022 -0.022
1000 30 -0.034 -0.032 -0.024 -0.024 -0.017 -0.017 -0.026 -0.029 -0.021 -0.018 -0.013 -0.013
Note: N=training sample size; i=number of items
229
Table 6.17
Sensitivity difference between relaxed lasso with a ROC classifier and data generating theta in conditions with diagnosis-
test correlation of .70.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 -0.020 -0.017 -0.028 -0.020 -0.032 -0.032 -0.006 0.001 -0.015 -0.008 -0.018 -0.022
250 30 -0.006 -0.006 -0.025 -0.018 -0.025 -0.023 0.012 0.011 0.007 0.001 -0.010 -0.004
500 10 -0.024 -0.017 -0.035 -0.028 -0.023 -0.025 0 -0.009 -0.013 -0.017 -0.015 -0.016
500 30 -0.018 -0.017 -0.012 -0.008 -0.015 -0.016 -0.001 0.002 -0.004 0.001 -0.008 -0.005
1000 10 -0.023 -0.026 -0.022 -0.019 -0.024 -0.022 -0.010 -0.010 -0.015 -0.011 -0.012 -0.015
1000 30 -0.015 -0.020 -0.010 -0.012 -0.011 -0.011 -0.001 0.003 0.004 -0.001 -0.003 -0.004
Note: N=training sample size; i=number of items
230
Table 6.18
Specificity difference between relaxed lasso with a ROC classifier and data generating theta in conditions with diagnosis-
test correlation of .70.
Item Categories
2 5
Prevalence
0.05 .10 .20 0.05 .10 .20
Local Dependence
N i 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 10 -0.090 -0.101 -0.080 -0.082 -0.055 -0.058 -0.076 -0.081 -0.058 -0.064 -0.037 -0.036
250 30 -0.079 -0.076 -0.053 -0.057 -0.031 -0.035 -0.071 -0.068 -0.055 -0.053 -0.028 -0.035
500 10 -0.084 -0.086 -0.055 -0.062 -0.046 -0.043 -0.066 -0.060 -0.041 -0.038 -0.027 -0.029
500 30 -0.054 -0.054 -0.040 -0.044 -0.024 -0.022 -0.047 -0.050 -0.028 -0.034 -0.018 -0.020
1000 10 -0.065 -0.060 -0.048 -0.052 -0.039 -0.039 -0.041 -0.040 -0.027 -0.033 -0.025 -0.023
1000 30 -0.035 -0.033 -0.025 -0.025 -0.019 -0.018 -0.027 -0.031 -0.024 -0.020 -0.016 -0.015
Note: N=training sample size; i=number of items
231
Table 7.1
Number of replications in conditions with binary items where CART did not choose at least
two items (where theta score was not estimated)
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 404 447 411 449 408 446 434 452 423 457 410 462
250 .5 332 403 351 404 343 415 356 390 338 408 326 389
250 .7 213 305 249 299 213 294 233 309 183 291 179 276
500 .3 393 443 411 439 20 2 25 1 26 0 28 5
500 .5 296 341 285 358 0 0 0 0 0 0 1 0
500 .7 140 211 143 201 102 140 90 157 59 97 54 97
1000 .3 341 417 342 408 14 3 13 1 20 2 19 2
1000 .5 186 253 183 244 0 0 0 0 0 0 0 0
1000 .7 55 93 48 101 26 45 29 42 7 12 9 9
Note: N=training sample size; r=diagnosis-test correlation
232
Table 7.2
Number of replications in conditions with polytomous items where CART did not choose at
least two items (where theta score was not estimated)
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 451 464 452 464 454 462 463 473 463 482 455 474
250 .5 432 452 418 448 428 449 434 443 383 427 409 438
250 .7 354 392 344 373 314 391 325 360 286 345 287 362
500 .3 456 470 459 466 1 0 0 0 5 0 2 0
500 .5 362 421 359 426 0 0 0 0 0 0 0 0
500 .7 240 308 239 318 163 216 150 225 112 153 99 158
1000 .3 429 453 418 447 0 0 1 0 0 1 2 1
1000 .5 271 315 275 314 0 0 0 0 0 0 0 0
1000 .7 101 153 120 152 37 82 36 87 16 27 16 25
Note: N=training sample size; r=diagnosis-test correlation
233
Table 7.3
Correlation between true and estimated theta from CART in conditions with binary items
Prevalence
.10 .20
Local Dependence
0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30
500 .3 0.999 0.979 0.999 0.979 0.999 0.990 0.999 0.990
500 .5 0.998 0.970 0.997 0.970 0.999 0.986 0.999 0.987
500 .7 0.764 0.622 0.770 0.620 0.793 0.643 0.799 0.649
1000 .3 1 0.996 1 0.996 1 0.999 1 0.999
1000 .5 1 0.992 1 0.992 1 0.998 1 0.998
1000 .7 0.835 0.686 0.843 0.682 0.878 0.742 0.874 0.741
Note: N=training sample size; r=diagnosis-test correlation
234
Table 7.4
Correlation between true and estimated theta from CART in conditions with polytomous
items
Prevalence
.10 .20
Local Dependence
0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30
500 .3 0.999 0.980 0.999 0.980 1 0.991 1 0.990
500 .5 0.998 0.975 0.998 0.975 1 0.989 1 0.988
500 .7 0.773 0.706 0.783 0.698 0.800 0.727 0.798 0.726
1000 .3 1 0.995 1 0.995 1 0.998 1 0.999
1000 .5 1 0.993 1 0.994 1 0.998 1 0.998
1000 .7 0.838 0.762 0.837 0.754 0.874 0.784 0.877 0.788
Note: N=training sample size; r=diagnosis-test correlation
235
Table 7.5
Mean squared error of estimated theta from CART in conditions with binary items
Prevalence
.10 .20
Local Dependence
0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30
500 .3 0.001 0.019 0.001 0.019 0 0.009 0 0.009
500 .5 0.002 0.028 0.002 0.027 0.001 0.012 0.001 0.012
500 .7 0.185 0.343 0.178 0.341 0.163 0.324 0.156 0.317
1000 .3 0 0.004 0 0.003 0 0.001 0 0.001
1000 .5 0 0.007 0 0.007 0 0.002 0 0.002
1000 .7 0.129 0.286 0.121 0.287 0.095 0.234 0.098 0.235
Note: N=training sample size; r=diagnosis-test correlation
236
Table 7.6
Mean squared error of estimated theta from CART in conditions with polytomous items
Prevalence
.10 .20
Local Dependence
0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30
500 .3 0.001 0.019 0.001 0.019 0 0.009 0 0.009
500 .5 0.002 0.024 0.001 0.024 0 0.011 0 0.011
500 .7 0.198 0.282 0.189 0.286 0.174 0.261 0.176 0.260
1000 .3 0 0.004 0 0.004 0 0.001 0 0.001
1000 .5 0 0.006 0 0.006 0 0.002 0 0.002
1000 .7 0.141 0.227 0.141 0.236 0.110 0.207 0.106 0.201
Note: N=training sample size; r=diagnosis-test correlation
237
Table 7.7
Number of replications in conditions with binary items where lasso did not choose at least
two items (where theta score was not estimated)
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 492 487 492 483 490 489 498 489 482 464 473 457
250 .5 428 413 439 421 410 390 412 382 318 269 314 251
250 .7 247 199 275 204 173 129 179 118 51 26 59 33
500 .3 493 489 494 488 493 484 491 476 464 442 457 433
500 .5 408 377 407 380 302 231 310 241 112 48 104 63
500 .7 168 92 144 89 20 10 23 5 1 1 1 1
1000 .3 490 484 489 480 480 463 481 467 378 337 394 330
1000 .5 288 229 286 230 118 68 101 52 4 0 6 1
1000 .7 26 5 29 13 0 0 2 0 0 0 2 0
Note: N=training sample size; r=diagnosis-test correlation
238
Table 7.8
Number of replications in conditions with polytomous items where lasso did not choose at least
two items (where theta score was not estimated)
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 484 488 490 488 488 487 488 488 463 452 468 456
250 .5 420 407 425 401 396 349 371 335 258 235 274 208
250 .7 188 174 199 141 114 81 134 88 25 8 28 9
500 .3 487 488 493 483 488 477 490 486 453 423 443 414
500 .5 376 340 389 368 235 203 257 192 65 29 67 41
500 .7 106 69 89 58 13 4 12 3 0 0 0 0
1000 .3 485 472 486 477 467 462 474 459 341 316 354 325
1000 .5 264 179 255 201 61 45 71 41 6 3 3 1
1000 .7 13 3 10 6 0 0 0 0 0 1 0 0
Note: N=training sample size; r=diagnosis-test correlation
239
Table 7.9
Correlation between true and estimated theta from lasso in conditions with diagnosis-
test correlation of .70
Number of item categories
2 5
Local Dependence
0 .3 0 .3
Number of items
N p 10 30 10 30 10 30 10 30
250 .05 0.702 0.651 0.690 0.659 0.794 0.778 0.797 0.779
250 .10 0.743 0.709 0.741 0.710 0.820 0.813 0.823 0.805
250 .20 0.808 0.780 0.806 0.780 0.873 0.862 0.865 0.861
500 .05 0.743 0.715 0.747 0.711 0.832 0.812 0.831 0.813
500 .10 0.807 0.798 0.812 0.796 0.884 0.877 0.878 0.872
500 .20 0.890 0.869 0.892 0.869 0.936 0.919 0.931 0.920
1000 .05 0.804 0.792 0.809 0.796 0.890 0.879 0.892 0.877
1000 .10 0.882 0.869 0.883 0.866 0.935 0.925 0.934 0.923
1000 .20 0.938 0.915 0.937 0.916 0.964 0.951 0.965 0.951
Note: N=training sample size; p=prevalence
240
Table 7.10
Mean squared error of the estimated theta from lasso in conditions with diagnosis-test
correlation of .70
Number of item categories
2 5
Local Dependence
0 .3 0 .3
Number of items
N p 10 30 10 30 10 30 10 30
250 .05 0.178 0.209 0.174 0.209 0.178 0.209 0.174 0.209
250 .10 0.156 0.175 0.153 0.184 0.156 0.175 0.153 0.184
250 .20 0.111 0.132 0.117 0.131 0.111 0.132 0.117 0.131
500 .05 0.146 0.179 0.146 0.178 0.146 0.179 0.146 0.178
500 .10 0.102 0.118 0.107 0.122 0.102 0.118 0.107 0.122
500 .20 0.056 0.078 0.061 0.076 0.056 0.078 0.061 0.076
1000 .05 0.096 0.115 0.094 0.117 0.096 0.115 0.094 0.117
1000 .10 0.057 0.072 0.057 0.074 0.057 0.072 0.057 0.074
1000 .20 0.032 0.047 0.031 0.047 0.032 0.047 0.031 0.047
Note: N=training sample size; p=prevalence
241
Table 7.11
Mean squared error of the estimated theta from random forest in conditions with binary items
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 0.140 0.080 0.145 0.081 0.141 0.080 0.141 0.080 0.138 0.078 0.136 0.078
250 .5 0.145 0.089 0.147 0.088 0.141 0.087 0.141 0.086 0.133 0.080 0.131 0.079
250 .7 0.150 0.095 0.156 0.095 0.146 0.092 0.146 0.092 0.129 0.080 0.129 0.081
500 .3 0.138 0.079 0.139 0.079 0.136 0.079 0.134 0.080 0.132 0.076 0.130 0.076
500 .5 0.143 0.089 0.146 0.089 0.137 0.087 0.139 0.086 0.126 0.078 0.127 0.078
500 .7 0.151 0.098 0.148 0.098 0.140 0.092 0.141 0.092 0.125 0.080 0.126 0.080
1000 .3 0.136 0.079 0.138 0.079 0.132 0.078 0.133 0.078 0.126 0.074 0.128 0.075
1000 .5 0.142 0.090 0.143 0.090 0.133 0.085 0.135 0.086 0.122 0.077 0.120 0.078
1000 .7 0.148 0.098 0.146 0.098 0.137 0.094 0.135 0.091 0.121 0.080 0.122 0.079
Note: N=training sample size; r=diagnosis-test correlation
242
Table 7.12
Mean squared error of the estimated theta from random forest in conditions with polytomous items
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 0.104 0.051 0.104 0.049 0.103 0.049 0.103 0.051 0.099 0.050 0.103 0.050
250 .5 0.110 0.054 0.110 0.055 0.108 0.052 0.107 0.052 0.097 0.048 0.098 0.048
250 .7 0.109 0.054 0.111 0.055 0.105 0.051 0.106 0.051 0.093 0.045 0.092 0.045
500 .3 0.105 0.052 0.104 0.053 0.102 0.051 0.101 0.051 0.095 0.049 0.097 0.048
500 .5 0.114 0.058 0.115 0.057 0.107 0.054 0.107 0.054 0.096 0.048 0.094 0.047
500 .7 0.118 0.060 0.118 0.060 0.107 0.053 0.105 0.053 0.090 0.045 0.091 0.044
1000 .3 0.102 0.052 0.103 0.053 0.100 0.051 0.101 0.050 0.093 0.047 0.095 0.047
1000 .5 0.115 0.060 0.114 0.060 0.105 0.055 0.105 0.054 0.092 0.045 0.093 0.046
1000 .7 0.120 0.064 0.122 0.063 0.106 0.054 0.107 0.054 0.087 0.043 0.089 0.043
Note: N=training sample size; r=diagnosis-test correlation
243
Table 7.13.
Correlation between true and estimated theta from random forest in conditions with binary items
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 0.820 0.910 0.814 0.911 0.820 0.912 0.819 0.911 0.823 0.914 0.825 0.914
250 .5 0.813 0.902 0.810 0.901 0.819 0.904 0.817 0.904 0.829 0.911 0.832 0.912
250 .7 0.807 0.894 0.800 0.894 0.812 0.898 0.811 0.897 0.834 0.912 0.834 0.910
500 .3 0.823 0.912 0.820 0.913 0.826 0.914 0.827 0.912 0.831 0.917 0.832 0.916
500 .5 0.818 0.902 0.811 0.901 0.824 0.905 0.822 0.905 0.838 0.914 0.836 0.913
500 .7 0.807 0.892 0.809 0.892 0.820 0.898 0.817 0.898 0.839 0.912 0.837 0.912
1000 .3 0.825 0.913 0.822 0.912 0.831 0.915 0.828 0.914 0.838 0.919 0.835 0.917
1000 .5 0.818 0.902 0.816 0.900 0.829 0.907 0.826 0.905 0.843 0.915 0.845 0.914
1000 .7 0.810 0.892 0.811 0.892 0.824 0.897 0.825 0.899 0.845 0.913 0.843 0.913
Note: N=training sample size; r=diagnosis-test correlation
244
Table 7.14.
Correlation between true and estimated theta from random forest in conditions with polytomous items
Prevalence
.05 .10 .20
Local Dependence
0 .3 0 .3 0 .3
Number of items
N r 10 30 10 30 10 30 10 30 10 30 10 30
250 .3 0.820 0.910 0.814 0.911 0.820 0.912 0.819 0.911 0.823 0.914 0.825 0.914
250 .5 0.813 0.902 0.810 0.901 0.819 0.904 0.817 0.904 0.829 0.911 0.832 0.912
250 .7 0.807 0.894 0.800 0.894 0.812 0.898 0.811 0.897 0.834 0.912 0.834 0.910
500 .3 0.823 0.912 0.820 0.913 0.826 0.914 0.827 0.912 0.831 0.917 0.832 0.916
500 .5 0.818 0.902 0.811 0.901 0.824 0.905 0.822 0.905 0.838 0.914 0.836 0.913
500 .7 0.807 0.892 0.809 0.892 0.820 0.898 0.817 0.898 0.839 0.912 0.837 0.912
1000 .3 0.825 0.913 0.822 0.912 0.831 0.915 0.828 0.914 0.838 0.919 0.835 0.917
1000 .5 0.818 0.902 0.816 0.900 0.829 0.907 0.826 0.905 0.843 0.915 0.845 0.914
1000 .7 0.810 0.892 0.811 0.892 0.824 0.897 0.825 0.899 0.845 0.913 0.843 0.913
Note: N=training sample size; r=diagnosis-test correlation
245
Table 7.A.
Average number of items chosen by CART in conditions with 10 items
Number of item categories
2 5
Prevalence
0.05 .10 0.20 .05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.848 0.786 0.852 0.632 0.808 0.918 0.482 0.432 0.400 0.378 0.360 0.420
250 .5 1.404 1.296 1.350 1.286 1.440 1.504 0.628 0.816 0.740 0.642 1.038 0.864
250 .7 2.490 2.254 2.428 2.420 2.594 2.780 1.328 1.238 1.416 1.386 1.716 1.648
500 .3 1.072 0.922 9.554 9.464 9.456 9.398 0.508 0.448 9.920 9.930 9.886 9.948
500 .5 1.856 1.914 9.844 9.822 9.934 9.944 1.078 1.128 9.848 9.876 9.972 9.966
500 .7 3.360 3.292 3.740 3.900 4.382 4.574 2.042 1.956 2.356 2.520 2.852 2.888
1000 .3 1.916 2.002 9.694 9.732 9.588 9.620 0.722 0.804 9.998 9.976 10 9.960
1000 .5 3.086 3.380 9.962 9.970 9.984 9.988 1.824 1.852 9.996 9.994 10 9.998
1000 .7 4.518 4.700 5.552 5.670 6.626 6.546 3.152 2.890 3.970 4.042 4.962 5.030
Note: N=training sample size; r=diagnosis-test correlation
246
Table 7.B.
Average number of items chosen by CART in conditions with 30 items
Number of item categories
2 5
Prevalence
0.05 .10 0.20 .05 .10 .20
Number of items
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.564 0.544 0.506 0.442 0.492 0.436 0.360 0.360 0.352 0.342 0.238 0.272
250 .5 0.900 0.940 0.858 0.930 0.892 1.012 0.516 0.580 0.528 0.576 0.720 0.670
250 .7 1.672 1.680 1.612 1.624 1.810 1.836 0.928 1.040 0.950 1.124 1.260 1.236
500 .3 0.548 0.572 24.052 24.094 26.832 26.736 0.356 0.380 21.362 21.292 25.132 25.046
500 .5 1.282 1.242 22.338 22.340 25.794 25.972 0.790 0.754 19.934 20.094 24.294 24.208
500 .7 2.456 2.292 2.906 2.798 3.278 3.332 1.506 1.468 2.076 1.980 2.478 2.390
1000 .3 0.826 0.948 28.506 28.678 29.332 29.360 0.610 0.560 27.346 27.448 29.046 29.042
1000 .5 2.142 2.008 27.332 27.392 29.114 29.048 1.542 1.486 26.574 26.584 28.762 28.782
1000 .7 3.776 3.742 5.042 4.626 7.624 7.310 2.678 2.612 3.372 3.224 3.976 4.312
Note: N=training sample size; r=diagnosis-test correlation
247
Table 7.C.
Average number of items chosen by lasso in conditions with 10 items
Number of item categories
2 5
Prevalence
0.05 .10 0.20 .05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.058 0.050 0.072 0.018 0.136 0.186 0.106 0.062 0.090 0.074 0.260 0.210
250 .5 0.478 0.386 0.634 0.642 1.264 1.348 0.544 0.522 0.722 0.860 1.744 1.654
250 .7 1.842 1.644 2.588 2.502 4.128 4.074 2.290 2.246 3.024 2.948 4.536 4.276
500 .3 0.048 0.046 0.056 0.070 0.220 0.288 0.084 0.054 0.078 0.060 0.318 0.400
500 .5 0.650 0.632 1.394 1.364 3.098 3.162 0.870 0.798 1.790 1.700 3.440 3.482
500 .7 2.680 2.868 4.474 4.536 6.036 6.106 3.308 3.408 4.864 4.738 6.334 6.122
1000 .3 0.062 0.082 0.142 0.116 0.868 0.750 0.090 0.108 0.212 0.174 1.022 1
1000 .5 1.462 1.544 3.022 3.200 5.150 5.120 1.786 1.830 3.428 3.488 5.056 5.078
1000 .7 4.538 4.602 6 5.984 7.242 7.234 5.132 5.216 6.402 6.374 7.454 7.458
Note: N=training sample size; r=diagnosis-test correlation
248
Table 7.D.
Average number of items chosen by lasso in conditions with 30 items
Number of item categories
2 5
Prevalence
0.05 .10 0.20 .05 .10 .20
Local Dependence
N r 0 .3 0 .3 0 .3 0 .3 0 .3 0 .3
250 .3 0.100 0.112 0.086 0.106 0.314 0.306 0.094 0.094 0.104 0.108 0.364 0.308
250 .5 0.714 0.608 0.970 0.978 2.162 2.168 0.700 0.768 1.174 1.366 2.334 2.524
250 .7 2.838 2.900 4.022 4.218 6.448 6.432 3.162 3.576 4.676 4.476 6.816 6.810
500 .3 0.076 0.090 0.124 0.164 0.412 0.464 0.084 0.118 0.140 0.082 0.524 0.624
500 .5 0.970 1.024 2.420 2.320 5.082 4.972 1.274 1.066 2.610 2.728 5.318 5.114
500 .7 4.792 4.674 7.558 7.556 10.432 10.434 4.908 4.984 7.660 7.442 9.992 9.994
1000 .3 0.136 0.160 0.258 0.240 1.320 1.418 0.186 0.148 0.272 0.296 1.470 1.382
1000 .5 2.548 2.594 5.296 5.344 8.558 8.498 3.074 2.758 5.210 5.190 8.076 7.840
1000 .7 7.986 7.884 11.034 10.888 13.938 13.986 8.060 7.934 10.814 10.638 13.33 13.47
Note: N=training sample size; r=diagnosis-test correlation
249
Figure 2.1. Variable Importance Measures for data-generating model. Note:
Classification rate is on the left panel, sensitivity is on the center panel, and specificity is
on the right panel.
ld1
ncat1
nitem1
ss1
prev1
cor1
0 50 100 150 200
r1
IncNodePurity
ld1
ncat1
nitem1
ss1
prev1
cor1
0 50 150 250
r2
IncNodePurity
ncat1
ld1
nitem1
ss1
prev1
cor1
0 50 100 150
r3
IncNodePurity
250
Figure 2.2. Variable importance measures of the raw summed scores. Note:
Classification rate is on the left panel, sensitivity is on the center panel, and specificity is
on the right panel.
ld1
ss1
ncat1
nitem1
prev1
cor1
0 50 100 150
r4
IncNodePurity
ld1
ncat1
nitem1
ss1
prev1
cor1
0 50 100 200
r5
IncNodePurity
ld1
ss1
ncat1
nitem1
prev1
cor1
0 50 100 150
r6
IncNodePurity
251
Figure 2.3. Variable importance measures of Estimated Theta. Note: Classification rate is
on the left panel, sensitivity is on the center panel, and specificity is on the right panel.
ld1
ss1
ncat1
nitem1
prev1
cor1
0 50 100 150
r7
IncNodePurity
ncat1
ld1
nitem1
ss1
prev1
cor1
0 50 100 200
r8
IncNodePurity
ld1
ss1
ncat1
nitem1
prev1
cor1
0 50 100 150
r9
IncNodePurity
252
Figure 4.1. Regression Tree to predict CART Classification Rate
|cor1: 3,5
nitem1: 10
ncat1: 2
cor1: 3
cor1: 3
ncat1: 2
0.7746 0.7125 0.7343
0.7117 0.6919 0.7277
0.8011
253
Figure 4.2. Regression Tree to predict CART sensitivity
|cor1: 3,5
cor1: 3
ncat1: 2
nitem1: 10
nitem1: 10
ncat1: 2
0.0683 0.2069
0.2260
0.1637 0.2859
0.3026
0.3541
254
Figure 4.3. Regression Tree to predict CART specificity
|cor1: 3,5
nitem1: 10
ncat1: 2 ncat1: 2
0.9389 0.8422
0.8410 0.8142
0.9129
255
Figure 4.4. Variable Importance Measures of CART. Note: Classification rate is on the
left panel, sensitivity is on the center panel, and specificity is on the right panel.
ld1
ss1
ncat1
nitem1
cor1
0 1 2 3 4 5
r34
IncNodePurity
ld1
ss1
ncat1
nitem1
cor1
0 5 10 15 20
r35
IncNodePurity
ld1
ss1
ncat1
cor1
nitem1
0 1 2 3 4
r36
IncNodePurity
256
Figure 4.5. Regression Tree to predict Random Forest Classification Rate
|cor1: 3,5
nitem1: 10ncat1: 2
nitem1: 10
ncat1: 20.7937 0.7880 0.7965
0.81320.8217 0.8303
257
Figure 4.6. Regression Tree to predict Random Forest Specificity
|cor1: 5,7
cor1: 5 ss1: 250ncat1: 2 nitem1: 10
0.9605 0.9419
0.9743 0.9874
0.9686 0.9899
258
Figure 4.7. Variable importance measures of Random Forest with Bayes classifier. Note:
Classification rate is on the left panel, sensitivity is on the center panel, and specificity is
on the right panel.
ld1
ncat1
ss1
nitem1
cor1
0.0 0.5 1.0 1.5
r31
IncNodePurity
ld1
ss1
nitem1
ncat1
cor1
0 20 60 100
r32
IncNodePurity
ld1
ncat1
ss1
nitem1
cor1
0.0 0.5 1.0 1.5 2.0
r33
IncNodePurity
259
Figure 4.8. Regression Tree to predict Lasso Logistic Regression Sensitivity
|ss1: 500
nitem1: 10
0.1420
0.1524 0.1849
260
Figure 4.9. Variable Importance Measures of Lasso Logistic Regression with Bayes
classifier. Note: Classification rate is on the left panel, sensitivity is on the center panel,
and specificity is on the right panel.
ld1
ss1
nitem1
0.000 0.002 0.004
r28
IncNodePurity
ld1
nitem1
ss1
0.00 0.05 0.10 0.15 0.20
r29
IncNodePurity
ld1
nitem1
ss1
0.0000 0.0010
r30
IncNodePurity
261
Figure 4.10. Regression Tree to predict Relaxed Lasso Sensitivity
|prev1: 0.05,1
prev1: 0.05
ss1: 500,1000 nitem1: 10
nitem1: 10
0.1238 0.1934 0.1884 0.2333
0.3492 0.3951
262
Figure 4.11. Regression Tree to predict Relaxed Lasso Specificity
|prev1: 2
ss1: 250 prev1: 0.050.9300 0.9397 0.9892 0.9775
263
Figure 4.12. Variable importance measures of Relaxed Lasso Logistic Regression with
Bayes Classifier. Note: Classification rate is on the left panel, sensitivity is on the center
panel, and specificity is on the right panel.
ld1
nitem1
ss1
prev1
0 2 4 6 8 10
r25
IncNodePurity
ld1
nitem1
ss1
prev1
0 10 20 30
r26
IncNodePurity
ld1
nitem1
ss1
prev1
0.0 0.5 1.0 1.5 2.0
r27
IncNodePurity
264
Figure 4.13. Regression Tree to predict Logistic Regression Classification Rates
|prev1: 2
cor1: 3,5ss1: 250
prev1: 0.05
ss1: 250 ss1: 2500.7628 0.7920 0.8134
0.9025 0.94340.8698 0.8950
265
Figure 4.14. Regression Tree to predict Logistic Regression Sensitivity
|cor1: 3,5
prev1: 0.05,1
ss1: 500,1000 cor1: 3
prev1: 0.05,1
ss1: 500,1000prev1: 0.05
0.02263 0.091670.06557 0.18970
0.12530 0.219400.26040
0.39700
266
Figure 4.15. Regression Tree to predict Logistic Regression Specificity
|ss1: 250
prev1: 2
cor1: 7
prev1: 2
cor1: 5,7
cor1: 5
cor1: 7
0.90530.9325 0.9493
0.9493 0.92810.9787
0.9773 0.9922
267
Figure 4.16. Variable importance measures of Logistic Regression with Bayes classifier.
Note: Classification rate is on the left panel, sensitivity is on the center panel, and
specificity is on the right panel.
ld1
ncat1
cor1
ss1
prev1
0 10 20 30 40
r22
IncNodePurity
ld1
ncat1
ss1
prev1
cor1
0 20 60 100
r23
IncNodePurity
ld1
ncat1
cor1
prev1
ss1
0 1 2 3 4 5 6
r24
IncNodePurity
268
Figure 4.17. Regression Tree to predict Sensitivity of the Random Forest model with a
ROC Classifier
|cor1: 3,5
ncat1: 2
nitem1: 10
prev1: 0.05
ss1: 500,1000 prev1: 1ss1: 1000
cor1: 3
nitem1: 10
ncat1: 2
prev1: 0.05
0.3029 0.48510.4110 0.5715
0.6288
0.6586
0.6337 0.6951
0.5172 0.6972
0.7732
0.7864
269
Figure 4.18 Variable Importance Measures of Random forest with ROC classifier. Note:
Classification rate is on the left panel, sensitivity is on the center panel, and specificity is
on the right panel.
ld1
ss1
prev1
ncat1
nitem1
cor1
0 50 100 200
r10
IncNodePurity
ld1
ss1
prev1
ncat1
nitem1
cor1
0 50 100 150
r11
IncNodePurity
ld1
ss1
prev1
ncat1
nitem1
cor1
0 50 150 250
r12
IncNodePurity
270
Figure 4.19. Regression Tree to predict Sensitivity for Logistic Regression with a ROC
Classifier
|cor1: 3
cor1: 5
ss1: 250
nitem1: 10prev1: 0.05
0.6021
0.6874
0.7741
0.5686 0.7491
0.7876
271
Figure 4.20. Variable Importance Measures for Logistic Regression with ROC classifier.
Note: Classification rate is on the left panel, sensitivity is on the center panel, and
specificity is on the right panel.
ld1
nitem1
ncat1
prev1
ss1
cor1
0 50 100 150 200
r13
IncNodePurity
ld1
ncat1
nitem1
prev1
ss1
cor1
0 50 100 150 200
r14
IncNodePurity
ld1
nitem1
ncat1
prev1
ss1
cor1
0 50 100 150 200
r15
IncNodePurity
272
Figure 4.21. Regression Tree to predict for classification rates for Lasso Logistic
Regression with ROC classifier
|ss1: 250,500
nitem1: 10 nitem1: 10
0.7137 0.7257 0.7264 0.7420
273
Figure 4.22. Regression True to predict Specificity for Lasso Logistic Regression with
ROC classifier
|prev1: 1,2
ss1: 250
nitem1: 10
0.7027
0.7103 0.7252
0.7305
274
Figure 4.23. Variable importance measure of Lasso Logistic Regression with ROC
classifier. Note: Classification rate is on the left panel, sensitivity is on the center panel,
and specificity is on the right panel.
ld1
ncat1
prev1
nitem1
ss1
0.0 0.1 0.2 0.3 0.4 0.5
r16
IncNodePurity
ld1
ncat1
nitem1
ss1
prev1
0.0 0.5 1.0 1.5 2.0
r17
IncNodePurity
ld1
ncat1
nitem1
ss1
prev1
0.0 0.2 0.4
r18
IncNodePurity
275
Figure 4.24. Regression Tree to predict Classification Rate of Relaxed Lasso with ROC
classifier
|nitem1: 10
ncat1: 2 prev1: 1,2
ss1: 250
0.7149 0.7320
0.7223 0.7372
0.7491
276
Figure 4.25. Regression Tree to predict Sensitivity of Relaxed Lasso with ROC Classifier
|prev1: 2
ss1: 250
prev1: 0.05
0.7605
0.7759
0.8062 0.7899
277
Figure 4.26. Regression Tree to predict Specificity of Relaxed Lasso with ROC classifier
|prev1: 1,2
nitem1: 10
0.7117 0.7252
0.7382
278
Figure 4.27. Variable Importance Measures of Relaxed Lasso with ROC Classifier. Note:
Classification rate is on the left panel, sensitivity is on the center panel, and specificity is
on the right panel.
ld1
ncat1
prev1
ss1
nitem1
0.0 0.1 0.2 0.3 0.4
r19
IncNodePurity
ld1
nitem1
ncat1
ss1
prev1
0.0 0.5 1.0 1.5 2.0
r20
IncNodePurity
ld1
ss1
ncat1
nitem1
prev1
0.0 0.2 0.4 0.6
r21
IncNodePurity
279
Figure 7.1. Regression Tree for Theta MSE of CART
|cor1: 3,5
nitem1: 10
ss1: 500 ss1: 500
0.005109
0.176000 0.114500 0.301100 0.239500
280
Figure 7.2. Regression Tree for the correlation between true and data-generating theta for
CART
|cor1: 7
nitem1: 10
ss1: 500 ncat1: 2
0.7855 0.8608 0.6783 0.7509
0.9945
281
Figure 7.3. Variable importance measures for CART person parameter recovery. Note:
theta MSE is on the left panel and correlation between true and estimated theta is on the
right panel.
ld1
prev1
ncat1
ss1
nitem1
cor1
0 50 100 150
r34
IncNodePurity
ld1
prev1
ncat1
ss1
nitem1
cor1
0 50 100 150 200
r35
IncNodePurity
282
Figure 7.4. Regression tree for theta MSE in the lasso logistic regression model
|ss1: 500,1000
prev1: 1,2
prev1: 1
ss1: 500
ncat1: 2
ncat1: 2
ss1: 500 ss1: 500
ncat1: 2
prev1: 2
nitem1: 10
prev1: 2
0.16640 0.112700.08576
0.066790.22830 0.17080 0.16380 0.10620
0.176600.21260 0.28340
0.12580 0.17890
283
Figure 7.5. Regression tree for correlation between estimated and data-generating theta
for lasso logistic regression
|ss1: 250
ncat1: 2
prev1: 0.05,1 prev1: 0.05,1
prev1: 0.05
ncat1: 2
ss1: 500 ss1: 500
prev1: 1
ncat1: 2
ss1: 500
0.7029 0.79120.8023 0.8626
0.7316 0.7977 0.8208 0.8836
0.8027 0.87530.9027
0.9237
284
Figure 7.6. Variable importance measures for lasso logistic regression person parameter
recovery. Note: theta MSE is on the left panel and correlation between true and estimated
theta is on the right panel.
ld1
nitem1
ncat1
prev1
ss1
0 2 4 6 8 10
r34
IncNodePurity
ld1
nitem1
ncat1
prev1
ss1
0 5 10 15
r35
IncNodePurity
285
Figure 7.7. Regression Tree for theta MSE in the random forest model
|nitem1: 10
ncat1: 2 ncat1: 2
0.13680 0.10330 0.08423 0.05164
286
Figure 7.8. Regression Tree for the correlation between estimated and data-generating
theta for random forest model
|nitem1: 10
ncat1: 2 ncat1: 2
0.8241 0.8813
0.9070 0.9458
287
Figure 7.9. Variable importance measures for random forest person parameter recovery.
Note: theta MSE is on the left panel and correlation between true and estimated theta is
on the right panel.
ld1
ss1
cor1
prev1
ncat1
nitem1
0 5 10 20 30
r34
IncNodePurity
ld1
ss1
cor1
prev1
ncat1
nitem1
0 10 30 50
r35
IncNodePurity