validity and diagnostic accuracy of scores from the …
TRANSCRIPT
The Pennsylvania State University
The Graduate School
College of Education
VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE AUTISM
DIAGNOSTIC OBSERVATION SCHEDULE-GENERIC
A Dissertation in
School Psychology
by
Melissa A. Reid
© 2012 Melissa A. Reid
Submitted in Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy
August 2012
ii
The dissertation of Melissa A. Reid was reviewed and approved* by the following: James C. DiPerna Associate Professor of Education Professor in Charge of the Program of School Psychology Dissertation Adviser Richard Hazler Professor of Education Robert Steven Professor of Education
Beverly J. Vandiver Associate Professor of Education
*Signatures are on file in the Graduate School
iii
Abstract
The purpose of this study was to examine the internal structure, relationships with other
variables, and diagnostic accuracy of scores on the Autism Diagnostic Observation Schedule –
Generic (ADOS-G; Lord et al., 1999) for the purpose of diagnostic decision-making. Participants
were 462 children enrolled in a public school district in the southern U.S. who were referred for
a school-based psychoeducational evaluation. Four hypotheses were tested with mixed results.
The first prediction was that items included in the Original Scoring Algorithm (OSA) would
reflect a uni-dimensional construct, and items included in the Revised Scoring Algorithm (RSA)
would reflect two constructs across modules. Exploratory factor analysis confirmed the one-
factor structure of the OSA across modules. However, a two-factor structure was not retained for
the Module 2 or Module 3 RSA. Second, it was predicted that total scores on the ADOS-G,
across modules and scoring algorithms, would demonstrate moderate to strong relations with
scores from other measures of autistic behavior, and weak relations with measures of emotional
functioning. Weak relationships were consistently measured between participants’ scores on the
ADOS-G across modules and algorithms and other measures of autistic and emotional
functioning. Third, it was predicted that scores obtained from application of the RSA would
result in greater diagnostic accuracy than those obtained from the OSA. Receiver Operating
Curve (ROC) analysis was conducted to determine the sensitivity and specificity of ADOS-G
scores. Consistent with hypotheses, the RSA typically resulted in greater diagnostic accuracy,
and a better balance between sensitivity and specificity than did the OSA. Finally, the fourth
hypothesis, which predicted that the diagnostic accuracy of the ADOS-G would be lower with an
independent criterion relative to an interdependent criterion, was not consistently supported. In
general, results of the current study confirm the structural validity and overall diagnostic
iv
accuracy of the ADOS-G, but also highlight some of the limitations of the instrument. Despite its
limitations, it was concluded that the strengths of the ADOS-G provide support for its continued
use in school-based psychoeducational evaluations for the diagnosis of students with Autism
Spectrum Disorders.
v
Table of Contents
List of Tables……………………………………………………………………………………viii
List of Appendices………………………………………………………………………………...x
Acknowledgements………………………………………………………………………………xii
Chapter 1. Introduction and Literature Review ...............................................................................1
Definition of Autism Spectrum Disorders……………………………………………………..2
Common Characteristics of Autism Spectrum Disorders……………………………………...3
Assessment and Diagnosis of Autism Spectrum Disorders…………………………………....5
Autism Diagnostic Observation Schedule……………………………………………………..8
Development and Evolution of the ADOS………………………………………………....9
Autism Diagnostic Observation Schedule-Generic…………………………………….....10
Rationale for Present Study…………………………………………………………………..36
Purpose and Hypotheses……………………………………………………………………...37
Chapter 2. Method……………………………………………………………………………….40
Participants…………………………………………………………………………………....40
Measures……………………………………………………………………………………...44
Autism Diagnostic Observation Schedule-Generic……………………………………….44
Gilliam Autism Rating Scale, Second Edition……………………………………………45
Behavior Assessment System for Children, Second Edition……………………………..47
Procedure……………………………………………………………………………………..50
Chapter 3. Results………………………………………………………………………………..53
Preliminary Analyses and Testing of Assumptions…………………………………………..53
ADOS-G Item Analysis…………………………………………………………………...53
Total, Scale, and Subscale Score Analysis………………………………………………..60
vi
Hypothesis 1: Factor Structure of the Original and Revised Scoring Algorithms……………60
Module 1-Original Scoring Algorithm……………………………………………………63
Module 1-Revised Scoring Algorithm……………………………………………………66
Module 2-Original Scoring Algorithm……………………………………………………70
Module 2-Revised Scoring Algorithm…………………………………………………….71
Module 3-Original Scoring Algorithm……………………………………………………77
Module 3-Revised Scoring Algorithm……………………………………………………79
Hypothesis 2: Relationships between Scores on the ADOS-G and Other Measures………...82
Module 1…………………………………………………………………………………..82
Module 2…………………………………………………………………………………..86
Module 3…………………………………………………………………………………..87
Hypothesis 3: Comparisons of Diagnostic Accuracy Indicators Across Scoring Algorithms..88
Original and Revised Scoring Algorithm Comparisons…………………………………..89
Updated Scoring Algorithms and Optimal Cut-Score Comparisons……………………...92
Hypothesis 4: Diagnostic Accuracy of Independent Clinical Diagnoses…………………….95
Chapter 4. Discussion……………………………………………………………………………99
Structural Validity Evidence………………………………………………………………...100
Module 1…………………………………………………………………………………100
Module 2…………………………………………………………………………………101
Module 3…………………………………………………………………………………102
Convergent and Discriminant Validity Evidence…………………………………………...103
Evidence of Diagnostic Accuracy…………………………………………………………...105
Module 1…………………………………………………………………………………105
vii
Module 2…………………………………………………………………………………107
Module 3…………………………………………………………………………………109
Independent Clinical Diagnoses…………………………………………………………….111
Summary of Evidence by Module and Scoring Algorithm…………………………………112
Module 1…………………………………………………………………………………112
Module 2…………………………………………………………………………………114
Module 3…………………………………………………………………………………116
Clinical Implications………………………………………………………………………...118
Limitations…………………………………………………………………………………..120
Future Research………………………………………………………………………………123
Conclusions……………………………………………………………………………………..124
References………………………………………………………………………………………126
Footnotes………………………………………………………………………………………..136
viii
List of Tables
Table 1. Sensitivity and Specificity of Original and Revised Scoring Algorithms by Research Study…………………………………………………………………………………………….23
Table 2. Demographic Characteristics of Total Sample (N = 462) and Independent Clinical Diagnosis Subsample (N = 100)…………………………………………………………………41
Table 3. Item Means, Standard Deviations, Skew and Kurtosis Values on Module 1 from the ADOS-G (N = 82)………………………………………………………………………………..54
Table 4. Item Means, Standard Deviations, Skew and Kurtosis Values on Module 2 from the ADOS-G (N =118)……………………………………………………………………………….56
Table 5. Item Means, Standard Deviations, Skew and Kurtosis Values on Module 3 from the ADOS-G (N = 262)……………………………………………………………………................58
Table 6. Participant’ Means, Standard Deviations, Score Range, Skew, and Kurtosis Values on the ADOS-G, GARS-2, and Selected Subscales from the BASC-2……………………………..61
Table 7. Structure Coefficients and Communalities for the ADOS-G Module 1 (Original Scoring Algorithm) Items (N = 82)……………………………………………………………………….65
Table 8. Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 1 (Revised Scoring Algorithm) Items (N = 66)…………………………………………67
Table 9. Structure Coefficients and Communalities for the ADOS-G Module 1 (Revised Scoring Algorithm) Items (N = 66)……………………………………………………………………….69
Table 10. Structure Coefficients and Communalities for the ADOS-G Module 2 (Original Scoring Algorithm) Items (N = 118)…………………………………………………………….72
Table 11. Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73)………………………………………...74
Table 12. Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73)……………………………………………………………...76
Table 13. Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items (N = 262)…………………………………………………………….78
Table 14. Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items (N = 261)…………………………………………………………….81
ix
Table 15. Pearson Correlations between Participants’ Total Scores on the ADOS-G Original and Revised Scoring Algorithms for Module 3 and Parent and Teacher Ratings on the GARS-2…………………………………………………………………………………………………..83
Table 16. Pearson Correlations between Participants; Total Scores on the ADOS-G Original, Revised, and Updated Scoring Algorithms and Parent and Teacher Ratings on the BASC-2…..84
Table 17. AUC Values, Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Original and Revised Scoring Algorithm………………………………………………………………………………………...90
Table 18. AUC Values and Optimal Cut-Scores for the ADOS-G Updated and Retained Scoring Algorithms……………………………………………………………………………………….93
Table 19. Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Updated and Retained Scoring Algorithms………………94
Table 20. Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Original and Revised Scoring Algorithm Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100)…………………………..96
Table 21. Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores from the Updated and Retained Original Scoring Algorithms Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100)……………98
x
List of Appendices
Appendix A: DSM-IV-TR Diagnostic Criteria for Autism Spectrum Disorders………………137
Appendix B: ……………………………………………………………………………………142
Table B1: Activities on the Autism Diagnostic Observation Schedule and their Purpose by Module (Lord, Rutter, DiLavore, & Risi, 1999)…………………………………………….142
Table B2: Items Rated on the Autism Diagnostic Observation Schedule by Subdomain and Module (Lord, Rutter, DiLavore, & Risi, 1999)…………………………………………….149
Table B3: Items Included in the Revised Scoring Algorithm on the Autism Diagnostic Observation Schedule-Generic by Developmental Cell…………………………………….153
Appendix C…………………………………………………………………………………….156
Table C1: Correlation Matrix of Items Included in the ADOS-G Module 1, Original Scoring Algorithm (N = 82)………………………………………………………………………….156
Table C2: Correlation Matrix of Items Included in the ADOS-G Module 1, Revised Scoring Algorithm (N = 66)………………………………………………………………………….157
Table C3: Correlation Matrix of Items Included in the ADOS-G Module 2, Original Scoring Algorithm (N = 118)………………………………………………………………………...158
Table C4: Correlation Matrix of Items Included in the ADOS-G Module 2, Revised Scoring Algorithm (N = 73)………………………………………………………………………….159
Table C5: Correlation Matrix of Items Included in the ADOS-G Module 3, Original Scoring Algorithm (N = 261)………………………………………………………………………...160
Table C6: Correlation Matrix of Items Included in the ADOS-G Module 3, Revised Scoring Algorithm (N = 262)………………………………………………………………………...161
Appendix D…………………………………………………………………………………….162
Table D1: Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOG-G Original Scoring Algorithm One-Factor Solutions…………………………..162
Table D2: Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOG-G Revised Scoring Algorithm One-Factor Solutions…………………………...164
Table D3: Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOG-G Revised Scoring Algorithm Two-Factor Solutions…………………………..166
Appendix E……………………………………………………………………………………..168
xi
Table E1. Structure Coefficients and Communalities for the ADOS-G Module 1 (Original Scoring Algorithm) Items with Deletion of Item A-5 (N = 82)……………………………..168
Table E2: Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items with Deletion of Item D-2 (N = 73)……………………………..169
Table E3: Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items with Deletion of Item A-4 (N = 262)……………………………170
Table E4: Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items with Deletion of Items D-1 and D-2 (N = 261)…………………171
Appendix F……………………………………………………………………………………..172
Table F1: Cut Scores Used for ADOS-G Classification Determinations by Module and Scoring Algorithm…………………………………………………………………………...172
Table F2: Sensitivity and Specificity Values of Scores on the Original Scoring Algorithm from the Current Sample and Lord et al.’s (1999) Original Sample………………………...173
Table F3: Sensitivity and Specificity Values of Scores on the Revised Scoring Algorithm from the Current Sample and Previous Studies……………………………………………. 174
Appendix G. Curriculum Vitae………………………………………………………………...175
xii
Acknowledgements
There are many people who have assisted me throughout the process of completing my
graduate education and my dissertation that deserve thanks for their efforts. First, I want to thank
Dr. James DiPerna, my adviser and dissertation chair, for all of his guidance, encouragement,
and faith over the last eight years. I sincerely thank you Jim for not giving up on me, even when I
had given up on myself. I truly appreciate all you have done and know that I would not be
writing acknowledgements to a completed dissertation without you. I would also like to thank
the other members of my doctoral committee, Drs. Richard Hazler, Robert Stevens, and Beverly
Vandiver, for their feedback over the years and contributions to my dissertation. To my
wonderful graduate school cohort, especially Miranda Freberg, Anne McGinnis, and Erin Meyer,
I never would have survived graduate school without you ladies! Thank you for your
collaboration and friendship over the years.
Thank you to the administrative staff of the Lewisville Independent School District,
Department of Special Education for allowing me to use district data to complete my
dissertation. I’d also like to thank my colleagues in Psychological Services who assisted me with
data collection and evaluation review. Special thanks to Robin Chaney, Jennifer Key, Jill
Littleton, Jessica Martin, Amorette Miller, Linda Pedersen, Shannon Spence, and Kimberly
Ward for providing me with endless support and friendship while I was attempting to “kill Earl”.
Thank you, Linda, for asking me about my dissertation progress each week in supervision,
despite the inevitable outcome, and for always holding me accountable for working on it.
Jennifer, thank you for reminding me that I would have never forgiven myself if I didn’t finish
what I started. You both played a special role in helping me get to the place that I am at today.
xiii
To my other family and friends who have provided me with love and support throughout
this long journey, your contributions have been greatly appreciated. My greatest thanks are to my
mother, Patricia Reid, to whom this work is dedicated. I owe all that I am and all that I have
achieved to you, and I wish that you were here to share in my greatest accomplishment. I hope
you are looking down on me with pride.
1
Chapter 1. Introduction and Literature Review
The Autism Diagnostic Observation Schedule-Generic (ADOS-G; Lord, Rutter,
DiLavore & Risi, 1999) is one of the most widely utilized diagnostic instruments in the direct
assessment of the social, communicative, and sensorimotor symptoms of Autism Spectrum
Disorders (ASD) in both clinical and educational settings. Despite its popularity and widespread
use, little independent research regarding the psychometric properties of ADOS-G scores has
been conducted to date. Thus, the purpose of this study was to examine the internal structure,
relationships with other variables, and diagnostic accuracy of ADOS-G scores for the purpose of
diagnostic decision making.
The following literature review begins with a brief overview of Autism Spectrum
Disorders (ASDs) and information on current assessment and diagnostic practices used in the
diagnosis of ASDs. The next section synthesizes existing research regarding the psychometric
properties of ADOS-G scores. This chapter then concludes with the rationale, purpose, and
primary hypotheses for the study.
Relevant studies were identified by searching PsychINFO and PsychARTICLES
databases with “ADOS” as the primary search term. This search yielded 191 studies that
included the ADOS as a key study descriptor. The search was narrowed by selecting only studies
that were published in a peer-reviewed journal, resulting in 166 possible articles for inclusion in
the synthesis. Abstracts were reviewed to identify research studies that examined reliability
and/or validity evidence (e.g., stability of measurement across examiners and/or time and
internal consistency of assessment items; evidence of test structure and diagnostic accuracy) for
ADOS scores as a study objective. If study outcomes were not clearly identified within the
abstract, full text was reviewed for clarification. Based on the abstract review, the vast majority
2
of the research studies featured the ADOS-G as a diagnostic measure of ASDs rather than
examining the instrument or its’ technical adequacy as a study outcome. As a result, only 17
studies were identified that met the criteria for inclusion in the synthesis.
Definition of Autism Spectrum Disorders
Autism is a general term often used to describe a group of disorders formally called
Pervasive Developmental Disorders (PDDs) and commonly referred to as Autism Spectrum
Disorders (ASD). ASDs can be defined as “cognitive and neuro-behavioral disorders, including,
but not limited to, three core-defining features: impairments in socialization, impairments in
verbal and nonverbal communication, and restricted and repetitive patterns of behaviors”
(Filipek et al., 1999, p. 439). In a recent report published by the Center for Disease Control and
Prevention (CDC; 2009), it was noted that ASDs affect approximately 1 in 110 children in the
United States. Symptoms of ASDs, which often include deficits in the use and understanding of
verbal and nonverbal communication, literal and repetitive patterns of thought, and sensory
processing deficits (Autism, n.d.), are typically present from birth or very early in development.
However, diagnosis often does not take place prior to the age of 2 years (Lord et al., 2006).
First reported by Kanner in 1943 as a “syndrome of autistic disturbances”, ASDs were
initially identified in case histories of children between the ages of 2 and 8 years that shared
“unique and previously unreported patterns of behavior, including social remoteness,
obsessiveness, stereotypy, and echolalia” (Filipek et al., 1999, p. 442). Although included in the
first and second editions of the Diagnostic and Statistical Manual for Mental Disorders (DSM;
American Psychiatric Association, 1952; 1968), ASDs were characterized as “psychotic
reactions in children, manifesting primarily autism” and were classified as “schizophrenic
reaction or schizophrenia, childhood type” (American Psychiatric Association, 1968, p. 28).
3
However, following the publication of the Diagnostic and Statistical Manual of Mental
Disorders, Third Edition (DSM-III; American Psychiatric Association, 1980), ASDs were
reclassified and reconceptualized. The term Pervasive Developmental Disorder (PDD) was first
introduced in the DSM-III, as was the differentiation between ASD and childhood schizophrenia
and other forms of psychoses (Filipek et al.). The terms Autistic Disorder and Pervasive
Developmental Disorder Not Otherwise Specified (PDD-NOS) were introduced in the
Diagnostic and Statistical Manual of Mental Disorders, Third Edition-Revision (DSM-III-R;
American Psychiatric Association, 1987).
According to the current Diagnostic and Statistical Manual for Mental Disorders, Fourth
Edition-Text Revision (DSM-IV-TR; American Psychiatric Association, 2000), there are five
distinct ASDs or PDDs: Autistic Disorder, Asperger’s Disorder, Rett’s Disorder, Childhood
Disintegrative Disorder, and Pervasive Developmental Disorder Not Otherwise. The diagnostic
criteria for each disorder, as listed within the DSM-IV-TR, are listed in Appendix A.
Common Characteristics of Autism Spectrum Disorders
As is evident from their definition and diagnostic criteria, ASDs affect essential human
behaviors such as social interaction, communication, imagination, and establishing relationships,
which typically result in life-long effects on learning, interpersonal interactions, independence,
and level of participation in the community (Autism, n.d.). According to the National Research
Council (2001), the level of impairment experienced by an individual with an ASD varies
according to their age of onset and the severity of their symptoms, as well as the absence or
presence of co-morbid psychiatric disorders. Across and within individuals, the manifestation of
an Autism Spectrum Disorder can vary over time: there is no single behavior that is always
typical or present in individuals with ASDs.
4
There are, however, several common behavioral characteristics that often are observed in
individuals with Autism Spectrum Disorders. First, speech and language difficulties, as well as
deficits in the use and understanding of nonverbal communication, are typically observed in
individuals on the spectrum. Although the severity of communication impairment varies across
the Autism Spectrum Disorders, all individuals with ASDs exhibit some of the following
behaviors: deficits in verbal language, such as failing to speak, repeating words or phrases heard,
and/or talking repetitively about one topic; atypical pitch, tone, prosody, and/or volume of
speech; failure to use spoken and body language to communicate; does not appear to be listening,
even when spoken to directly; and does not use nonverbal communication methods, such as
gesturing or pointing. In addition to expressive language deficits, individuals on the spectrum
also often experience difficulties with receptive language, or language comprehension (National
Research Council, 2001).
Cognitive and perceptual impairments also are often observed in individuals with Autism
Spectrum Disorders. Specifically, individuals on the spectrum often exhibit a here-and-now way
of thinking, which is typically very literal and repetitive in nature. They often demonstrate a lack
of curiosity about their environment and surroundings, and, at times, fail to attend to important
stimuli, focusing on irrelevant stimuli instead. An obsessive desire for sameness and repetition
may also be observed (National Research Council, 2001).
Deficits in reciprocal social interactions are the hallmark characteristics of all ASDs and
a variety of social deficits are typically observed in individuals on the spectrum. Common social
atypicalities include: resistance to being touched or held, failure to respond to name, an inability
to relate to peers and adults in an ordinary way (e.g., ignores or avoids people), failure to
5
appropriately modulate eye contact, lack of use of social smiling, and a general lack of
understanding of how other people think, feel, or view the world.
In addition to communication, cognitive and perceptual, and reciprocal social
impairments, individuals with ASDs also typically exhibit some degree of sensory processing
deficits and engagement in stereotyped behaviors. For example, those on the Spectrum may
exhibit extreme fear reactions to loud noises, strangers, new situations, changes, or surprises;
may be under- or over-responsive to physical pain; and may demonstrate distinct food and
clothing preferences. Further, individuals with spectrum disorders may rock or spin objects as a
form of self-stimulatory behavior, may require compulsive adherence to specific routines, may
become preoccupied with one or a few objects, and may tantrum or exhibit other aggressive
behaviors when upset (National Research Council, 2001).
Assessment and Diagnosis of Autism Spectrum Disorders
Although there are clearly defined diagnostic criteria, difficulties exist in the diagnosis of
ASDs. Despite being neurological in nature, the neuro-physiological markers of ASDs have not
yet been clearly identified or documented. As a result, physicians, psychologists, and other
professionals charged with diagnosing ASDs are required to rely on a child’s observable patterns
of behavioral functioning in order to make a diagnosis (Lord & Risi, 1998). Reliance on
observable symptoms, however, can be challenging for several reasons. First, the symptoms of
autism /ASDs can differ dramatically across individuals and within individuals across time
(Lord, 2010; Tsai, 1992). Significant symptom overlap between Autistic Disorder and the
various ASDs can make differential diagnosis between disorders quite difficult, especially in
younger and older individuals (Lord & Risi, 1998; Lord & Volkmar, 2002). Further, symptom
overlap between ASD’s and other physiological and psychological conditions, such as mental
6
retardation, other developmental disabilities, expressive and receptive language disorders,
Attention-Deficit/Hyperactivity Disorder (ADHD), and childhood-onset schizophrenia also
complicates differential diagnosis (American Psychiatric Association, 2000; Ghaziuddin, 2005;
Reaven, Hepburn, & Ross, 2008).
Due to the complexity of diagnosis, a multi-disciplinary approach to the diagnostic
assessment of Autism Spectrum Disorders is recommended (Filipek et al., 1999). Filipek et al.
recommended that each diagnostic evaluation should include a number of components, including
a comprehensive interview with parents and other caregivers in which a complete birth, medical,
family, and developmental history is obtained; direct observations of and interactions with the
child being assessed; assessment of the child’s adaptive and general behavioral functioning, and
direct assessment of the child’s speech/language/communication skills, cognitive functioning,
sensorimotor functioning, and academic functioning. Use of measures that are designed
specifically for the screening and diagnosis of ASDs are also strongly recommended (Filipek et
al., Risi et al., 2006).
Several specific screening and diagnostic measures for ASDs are widely used by
researchers and clinicians in the process of completing a multidisciplinary autism evaluation.
Two of the most commonly used rating scales at this time are the Childhood Autism Rating
Scale (CARS; Schopler, Reichler, & Rochen Renner, 1988) and the Gilliam Autism Rating Scale
(GARS; Gilliam, 1995). Although authors for both assessments have indicated that their scores
possess adequate reliability and validity for screening (CARS) and diagnostic (GARS) decisions
(Gilliam; Schopler et al.), independent research has raised some questions regarding the
usefulness and diagnostic accuracy of these assessments. Specifically, Lord and Risi (1998)
noted that the CARS does not effectively differentiate individuals with communication deficits
7
and cognitive and behavioral difficulties related to autism from examinees with expressive
language delays, cognitive impairments, and general behavioral difficulties that are not due to a
pervasive developmental disorder.
In their investigation of the discriminative ability and diagnostic utility of the GARS,
Mazefsky and Oswald (2006) determined that the measure does not accurately discriminate
children with autism from those with non-developmental disabilities. A 2008 study conducted by
Sikora, Hartley, McCoy, Gerrard-Morris, and Dill confirmed the instrument’s failure to
consistently discriminate examinees on the autism spectrum from those that are not. More
concerning, however, was Mazefsky and Oswald’s conclusion that the GARS systematically
underestimates the probability that examinees are on the autism spectrum. A previous study of
the GARS conducted by South et al. (2002) presented similar concerns with the diagnostic
accuracy of the instrument.
In 2006, a second edition of the GARS was published by the test author (GARS-2;
Gilliam). In an attempt to address the systematic concerns of the GARS raised by independent
researchers, the GARS-2 was created with a new normative sample of participants (Montgomery,
Newton, & Smith, 2008). Substantial revisions were made to the instrument, including the
elimination of one of the four subscales found within the measure and the introduction of an
interview component to allow for the evaluation of the child’s development during early
childhood (Gilliam, 2006). Independent research on the technical adequacy of the GARS-2 has
yet to be completed.
Another popular autism diagnostic measure is the Autism Diagnostic Interview-Revised
(ADI-R; Lord, Rutter, & LeCouteur, 1994). The ADI-R is a standardized comprehensive
interview that can be completed with parents/primary caregivers and, consistent with the
8
recommendations of Filipek et al. (1999), requests information about the child’s birth, health,
and developmental history. It is designed for use with caregivers of children under evaluation
who demonstrate a developmental level of at least 2 years, 0 months of age.
Validation studies completed by the authors indicate that scores from the ADI-R reliably
and validly diagnose autism in children and adolescents (Rutter, LeCouteur, & Lord, 2003).
Independent research has also confirmed its technical adequacy (Cicchetti, Lord, Koenig, Klin,
& Volkmar, 2008; Noterdaeme, Mildenberger, Sitter, & Amorosa, 2002; Papanikolaou et al.,
2009). However, concerns regarding the ADI-R have also been documented. Ventola et al.
(2006) noted that the typical length of time required for appropriate administration of the ADI-R
(i.e., 90 to 150 minutes; Rutter, LeCouteur, & Lord) is prohibitive and may make it impractical
for use in school-based evaluations. In addition, unlike other diagnostic assessment currently in
use, the ADI-R does not differentiate between Autistic Disorder and other ASDs (LeCouteur,
Haden, Hammal, & McConachie, 2008).
Autism Diagnostic Observation Schedule
Perhaps the most widely used diagnostic assessment of autism, also considered the
current “gold standard” in autism assessment (Kline-Tasman, Risi, & Lord, 2007), is the Autism
Diagnostic Observation Schedule-Generic (ADOS-G; Lord, Rutter, DiLavore & Risi, 1999).
Designed for use with individuals who are thought to have an ASD, the ADOS-G is a
standardized assessment of communication, social interaction, play/imagination, and stereotyped
behaviors and interests. The original ADOS was designed to provide researchers and clinicians
with a standardized tool that could be used to record a child or adolescent’s social and
communicative behavior throughout the course of a comprehensive evaluation for an Autism
Spectrum Disorder. Since the time of its initial release, the ADOS has evolved in order to be
9
used with a broader range of examinees, both in terms of age and expressive language level, and
in a variety of settings (DiLavore, Lord, & Rutter, 1995), and in order to provide more consistent
differential diagnosis between children and adolescents on the autism spectrum and those with
other developmental disabilities that are not on the spectrum (Lord et al., 1999). Published in
1999, the most current version of the ADOS is the Autism Diagnostic Observation Schedule-
Generic (ADOS-G; Lord et al., 1999).
Development and evolution of the ADOS. First published in 1989, the Autism
Diagnostic Observation Schedule (ADOS; Lord et al.) was intended to be used in the differential
diagnosis of ASDs from other disorders, such as mental retardation, and typical childhood
development. It also was designed as a research tool to directly study the social behaviors and
communication patterns found in individuals with ASDs.
At the time of its initial release, the ADOS was unique from other scales in two primary
ways (Lord et al., 1989). First, unlike other diagnostic measures of autism available at that time,
the ADOS was designed to focus examiners’ observations on clients’ social and communicative
functioning to identify the presence or absence of behaviors that are specific to autism. In
addition, the ADOS also provided examiners with specific administration directions to guide
their own behavior in conjunction with the behavior of their examinees (Lord et al.).
Despite its advances, the original ADOS was limited because it could only be utilized
with examinees between the ages of 5 and 12 whose expressive language skills were, at a
minimum, developmentally consistent with those of a 3-year-old child (Lord et al., 2000).
However, individuals with autism frequently exhibit delays and deficits in all areas of language
acquisition, including receptive, expressive, and pragmatic (social) language. Further, the
majority of children are under 5 years of age when first referred for an autism assessment (Lord
10
et al.). Administration time of the ADOS was also lengthy due to its large number of items, and
completion of the assessment was often problematic for examiners, especially with younger and
more impaired children (Lord et al., 2000).
In an attempt to address these limitations, DiLavore, Lord, and Rutter (1995) developed
the Pre-Linguistic Autism Diagnostic Observation Scale (PL-ADOS), which was a downward
extension of the ADOS for use with verbal children between the ages of 2 and 4 and with
examinees of any age who do not exhibit spontaneous expressive language. Thus, the
combination of the PL-ADOS and ADOS increased the overall utility of the instrument system
by broadening the range of individuals with whom the ADOS could be used.
Limitations remained with the ADOS and PL-ADOS, however. Most notably, research
indicated that the PL-ADOS was not able to accurately differentiate between Autism Spectrum
Disorders and non-spectrum developmental delays in children of preschool age (Lord et al.,
2000). In addition, the ADOS did not include normative data for individuals above the age of 12,
and its’ items and activities were not developmentally appropriate for adolescents and adults. In
response to these needs, an updated measure (ADOS-G, Lord et al., 1999) was published in 1999
and is still in use today.
Autism Diagnostic Observation Schedule-Generic. The ADOS-G was superior to its
predecessors in several significant ways. As a replacement for both the ADOS and the PL-
ADOS, the instrument was designed for use with individuals across the lifespan. Instead of
consisting of a standard pool of items that is to be administered to all examinees (as was found in
the original ADOS and PL-ADOS), the ADOS-G is composed of a set of modules including
assessment activities that are appropriate for use with the individuals for whom the module was
designed. Modules were designed with consideration of both the chronological age and verbal
11
fluency of the examinee in order to minimize the potential bias of expressive language ability on
performance, as was observed in previous iterations of the instrument (Lord et al., 2000). In
addition, across the modules, scoring determinations are based on deviations from the
expectations of abilities given the examinee’s expressive language level in order to better
differentiate the social and communication difficulties that are related to language ability versus
other developmental concerns (Lord et al.). Unlike the standardization samples used for
normative comparisons of performance on the ADOS and the PL-ADOS, which only included
individuals with Autistic Disorder, the standardization sample for the ADOS-G included
individuals with Autistic Disorder, Asperger’s Disorder, and Pervasive Developmental Disorder
Not Otherwise Specified, allowing for the comparison of a participant’s performance to those
with a range of PDDs (Lord et al.).
The ADOS-G consists of four modules. Only one module is administered to an examinee
during a comprehensive evaluation. Each module includes items from four subscales:
Communication, Reciprocal Social Interaction, Play/Creativity/Imagination, and Stereotyped
Behaviors and Restricted Interests. However each ADOS-G module is unique in its item
composition. Module 1 was designed for non-verbal examinees or for those that do not
consistently use spontaneous phrase speech (Lord et al., 2000). It is composed of 10 activities
(see Table B1 for a list of assessment activities by module), which result in ratings on 29
dimensions of functioning (see Table B2 for a list of rated dimensions by subscale for each of the
four modules). Module 2 was designed for use with “verbally fluent” (i.e., individuals who
“produce a range of flexible sentence types, provide language beyond the immediate context, and
describe logical connections within a sentence”) young children or older children who exhibit
some spontaneous phrase speech, but who are not “verbally fluent” (Lord, Rutter, DiLavore, &
12
Risi, 1999, p. 5). It is composed of 14 activities which are rated on 28 dimensions of functioning.
Older children and younger adolescents with regular use of fluent, spontaneous phrase speech are
administered Module 3, which is comprised of 13 activities and results in ratings on 28
dimensions of functioning. Module 4, designed for use with older adolescents and adults with
fluent expressive language abilities, is composed of 10 required and 5 optional activities that lead
to ratings on 31 dimensions of functioning. Unlike the other modules, the required activities in
Module 4 are not play-based and, instead, are comprised of a series of interview questions (Lord
et al.). According to Lord et al., the activities of Modules 1 and 2 are designed to allow for a
flexible, active assessment administration, whereas the administration of Modules 3 and 4 is
more structured.
Technical development of the Original Scoring Algorithm of ADOS-G. According to
Lord et al. (1999), items included in the Original Scoring Algorithm for each module were
selected from a larger pool of items included in the original version of the ADOS (Lord et al.,
1989) and the PL-ADOS (DiLavore et al., 1995) that assessed aspects of the DSM-IV/ICD10
diagnostic criteria for Autism Spectrum Disorders. From the initial pool, items were examined
for suitability. In addition, those that did not demonstrate adequate interrater reliability (i.e., r >
.80) and/or consistently result in scoring differences between participants with ASDs and those
without were discarded as potential scoring algorithm items. The remaining item pools were
submitted to exploratory factor analysis to further eliminate items that were outliers or that
demonstrated strong correlations to mental or chronological age (Lord et al.). Finally, ROC curve
analyses were conducted on the retained items to determine appropriate cut-scores for non-
Autism ASD and Autism classifications. Some items that “contributed to the possible assessment
13
of improvement over time” (p. 113) or that assessed behaviors of particular clinical importance
were retained on the instrument but not included in the final scoring algorithm.
Reliability evidence for scores from the ADOS-G Original Scoring Algorithm. Based on
the information provided by test authors in the administration manual (Lord et al., 1999), the
ADOS-G Original Scoring Algorithm consistently and accurately measures the symptoms and
characteristics of Autistic Disorder and non-autism Autism Spectrum Disorders, and
differentiates those with spectrum disorders from those without, and those with Autistic Disorder
from those with non-autism ASDs. Reliability analyses were conducted on individual items,
domain scores, and classification determinations. Item inter-rater reliabilities (i.e., kappa
coefficients) ranged from .55 to 1.0 for Module 1 (mean percent agreement = 91.5%), .48 to .93
for Module 2 (mean percent agreement = 89%), .46 to 1.0 for Module 3 (mean percent
agreement = 88.2%), and .41 to .93 for Module 4 (mean percent agreement = 88.25%). Inter-
rater reliability coefficients for the Social Interaction domain ranged from .88 to .97 across
modules, from .74 to .90 for the Communication domain across modules, and from .84 to .98
across modules for the Communication + Social Interaction Total used for diagnostic
classification determinations. Inter-rater agreement in diagnostic classifications for Autistic
Disorder versus non-spectrum disorders was 90% for Module 4, 91% for Module 2, and 100%
for Modules 1 and 3. Although inter-rater agreement in diagnostic classifications for non-autism
Autism Spectrum Disorders versus non-spectrum disorders was slightly lower (k = .84 to .93)
than observed for Autistic Disorder, it was still measured to be within an acceptable range. Test-
retest reliability coefficients were also reported for the Social Interaction (r = .78) and
Communication (r = .73) domain scores, and for the Communication + Social Interaction Total
score (r = .82) across modules, and interpreted by authors as evidence of “excellent stability” of
14
measurement (Lord et al., p. 116). In addition, the internal consistency of items within each
domain was assessed (α = .86 to .91 for the Social Interaction domain; α = .74 to .84 for the
Communication domain; α = .47 to .65 for the Stereotyped Behaviors and Restricted Interests)
and determined by authors to indicate good agreement (Lord et al.).
Validity evidence for scores from the ADOS-G Original Scoring Algorithm. Validity
analyses on scores from the ADOS-G Original Scoring Algorithm have been investigated by test
authors and independent researchers.
Structural validity. For each ADOS-G module, an exploratory factor analysis was run to
investigate the structural validity of the items included within the Original Scoring Algorithm.
Authors’ (Lord et al., 1999), reports indicated that, for each module, one major factor emerged,
onto which “almost all items in the Social Interaction and Communication domains loaded
highly” (p.116). However, pattern coefficients and other information regarding factorability were
not provided. Other independent analyses of the structural validity of the ADOS-G Original
Scoring Algorithm have not been conducted to date.
Evidence of diagnostic accuracy. Diagnostic accuracy also was investigated by authors.
For each participant, the diagnostic classification based on his or her Communication and Social
Interaction Total Score on the ADOS-G Original Scoring Algorithm was compared to his or her
clinical diagnosis. Sensitivity and specificity were calculated for each module using Receiver
Operating Characteristic (ROC) curves. Across modules, sensitivity values ranged from .93 to
1.0 and specificity values ranged from .93 to 1.0 when differentiating Autistic Disorder from a
nonspectrum disorder; sensitivity from .90 to .97 and specificity from .87 to .94 when
differentiating all Autism Spectrum Disorders (including Autistic Disorder) from a nonspectrum
15
disorder; and sensitivity from .80 to .94 and specificity from .88 to .94 when differentiating a
non-autism Autism Spectrum Disorder from a nonspectrum disorder.
Mazefsky and Oswald (2006) also examined the diagnostic utility and discriminative
ability of the ADOS-G Original Scoring Algorithm with a clinical sample of 75 children (ranging
in age from 2 to 8 years) with and without ASDs. Results of the study indicated a 77 percent
agreement between participants’ diagnostic classifications obtained from the ADOS-G and their
clinical diagnoses provided by a multidisciplinary diagnostic team consisting of a child
psychiatrist, clinical psychologist, education specialist, speech/language pathologist, and
occupational therapist.
In addition, Ventola et al. (2006) examined the usefulness of the ADOS-G Original
Scoring Algorithm in diagnosing ASDs in toddlers and young children. Based on their results,
the authors reported that the ADOS-G demonstrates high levels of sensitivity and positive
predictive value when used with toddlers and young children under 3 years of age. In addition,
Ventola et al. indicated that they observed high levels of agreement between the diagnostic
classification determinations of the ADOS-G, the classification determinations of the CARS, and
diagnostic determinations made using the evaluators’ clinical judgments.
The research of Papanikolaou et al. (2009) provides further evidence of the diagnostic
accuracy of the ADOS-G Original Scoring Algorithm. Papanikolaou et al. compared the
diagnostic classification determination of the ADOS-G with the clinical diagnosis of 77 children
ranging in age from 2 to 22 years. According to Papanikolaou et al., results of these comparisons
indicated that participants’ diagnostic classifications on the ADOS-G demonstrated satisfactory
to excellent agreement with participants’ clinical diagnoses (k = .49 - .73). The specificity,
sensitivity, and positive predictive value of the ADOS-G’s diagnostic classifications were also
16
calculated and examined. Although the specificity (.85 - .95) and sensitivity (.77 - .90) values
were measured to be slightly lower than those reported by Lord et. al. (2000), they were still
deemed to be within acceptable ranges by the authors (Papanikolaou et al).
Additional investigations into the diagnostic accuracy of scores from the ADOS-G
Original Scoring Algorithm provide evidence to support their use in the accurate differentiation
of individuals with ASDs from those with receptive language disorders (Noterdaeme,
Mildenberger, Sitter, & Amorosa, 2002) and other mental health disorders, such as mood and
behavior disorders (Sikora, Hartley, McCoy, Gerrard-Morris, & Dill, 2008). However, according
to Reaven, Hepburn, and Ross (2008), scores derived from the ADOS-G Original Scoring
Algorithm are unable to accurately differentiate between children with an ASD and those with
active psychosis.
Research has also been conducted to investigate the agreement between a participant’s
ADOS-G Original Scoring Algorithm diagnostic classification and his or her diagnostic
classification on the ADI-R. Le Couteur, Haden, Hammel, and McConachie (2008) examined the
percent agreement between the diagnostic classifications of scores on the two instruments in a
sample of 101 preschoolers. Results of this study indicated that the ADOS-G and ADI-R scoring
algorithms yielded consistent diagnostic classifications 76 percent of the time (k = .52).
Tomanik, Pearson, Loveland, Lane, and Shaw (2007) also examined the percent agreement
between classification determinations of the ADOS-G Original Scoring Algorithm and the ADI-
R in a sample of 129 children and adolescents. Similar to the results reported by Le Couteur et
al., Tomanik et al.’s results indicated agreement between the ADOS-G and ADI-R 75 percent of
the time.
17
In summary, although the research conducted to date has adequately demonstrated the
diagnostic accuracy of the Original Scoring Algorithm and has documented acceptable levels of
classification agreement between the scoring algorithms on the ADOS-G and the ADI-R, other
forms of reliability and validity evidence are lacking at this time. Specifically, the literature
review did not yield any studies focused on the internal structure of the measurement tool.
Limitations of the ADOS-G Original Scoring Algorithm. Although research indicates
that scores from the ADOS-G Original Scoring Algorithm demonstrate adequate technical
properties for use and, in general, accurately categorizes examinees’ performance (Gotham, Risi,
Pickles & Lord, 2007), several criticisms of the ADOS-G Original Scoring Algorithm have been
reported in the literature. The ADOS-G authors also have identified several limitations of the
instrument over the last decade. Bishop and Norbury (2002) reported that the ADOS-G Original
Scoring Algorithm often over-classifies individuals with specific language impairments. In a
2004 study conducted by de Bildt et al., the ADOS-G Original Scoring Algorithm demonstrated
lower levels of sensitivity and specificity when used to discriminate individuals with mild mental
retardation from those with an Autism Spectrum Disorder. Gotham et al. also acknowledged
limitations of the ADOS-G Original Scoring Algorithm related to an examinee’s cognitive
ability. Specifically, they noted that the instrument currently does not take developmental
cognitive ability into account when selecting a module for administration or when scoring an
examinee’s performance, which may result in inaccurate diagnostic classifications for those with
lower mental functioning than expected base d on their chronological age. In addition, Gotham et
al. reported that the Original Scoring Algorithm, which utilizes different items across modules,
makes comparisons of performance across modules difficult.
18
Of additional concern to Gotham et al. (2007) was the Original Scoring Algorithm’s lack
of consideration regarding an examinee’s engagement in restricted, repetitive behaviors (RRB).
Although items to assess RRB are included on the ADOS-G, they were intentionally excluded
from the Original Scoring Algorithm due to the authors’ concern over the short period of time
available to observe these behaviors throughout the ADOS-G administration. However, in a
review of the stability of ASD diagnoses over time, Lord et al. (2006) reported that the inclusion
of RRB in diagnostic determinations, even when only observed in a limited context,
independently contribute to diagnostic stability.
Revised scoring algorithm for the ADOS-G. In response to the current limitations of the
ADOS-G, Gotham et al. (2007) conducted a study to review and make changes to the Original
Scoring Algorithm in order to (a) improve the overall diagnostic accuracy of the instrument, (b)
address the identified concerns regarding the impact of cognitive ability, expressive language
level, and chronological age on an examinee’s performance, (c) include RRB in diagnostic
determinations; and (d) increase consistency of the conceptual items included in the scoring
algorithm across modules to allow for easier comparison of performance across modules.
Data from 1,630 cases (i.e., complete ADOS-G administrations) were used in the study’s
analyses. Data were obtained from 1,139 different participants. An unidentified number of
participants completed more than one ADOS-G administration, and the data from each of the
administrations were included in the analyses as a separate case. Participants ranged in age from
14 to 192 months at the time of ADOS-G administration, and completed the assessment as a part
of a diagnostic evaluation at a mid-western autism/communication disorders clinic or as a
research study participant recruited at several sites across the U.S. Fifty-six percent of
participants had a clinical diagnosis of Autistic Disorder, 27 percent were diagnosed with a
19
milder Autism Spectrum Disorder, and 17 percent had a diagnosis of a non-ASD developmental
delay. Only data from Module 1, 2, and 3 administrations were included in the analyses due to
the authors’ beliefs that older adolescents and adults on the Autism Spectrum exhibit distinct
behavior patterns and, as such, require separate examination (Gotham et al., 2007).
Technical development of the ADOS-G Revised Scoring Algorithm. When generating the
new diagnostic algorithms, researchers took several steps. First, they looked at the correlations
between total scores on the ADOS-G and chronological age, verbal ability, and mental age of
participants and then divided the sample by chronological age and language ability to create cells
that minimized the correlations between total scores and demographic variables. Once the new
cells were generated, the authors examined individual items within each of the modules and
selected those that best differentiated between clinical diagnoses for inclusion in the new scoring
algorithm. Selected items were also subjected to exploratory multi-factor item response analysis
to investigate factor structure and to organize the items into domains for each of the three
modules. The new models were then examined using confirmatory factor analysis (CFA), and
logistic regression was used to determine the “predictive value” of scores from each of the
identified factors to diagnostic determination. Finally, Receiver Operating Characteristic (ROC)
curve analysis was conducted to determine the sensitivity (i.e., accurate positive classifications,
or the percentage of participants with a clinical disorder that are accurately diagnosed as having
the disorder) and specificity (i.e., accurate negative classifications, or the percentage of
participants without a clinical disorder that are accurately diagnosed as not having the disorder)
of the original and the newly revised scoring algorithms within each of the generated cells
(Gotham et al., 2007).
20
Means and score distributions obtained using the original and revised scoring algorithms
were examined. In Module 1, because the range of possible scores of non-verbal children was
restricted due to their lack of expressive language, the authors suggested dividing Module 1 into
two algorithms: Module 1 No Words and Module 1 Some Words to correct for the influence of
expressive language level on performance. In Module 2, correlation analysis revealed that there
was a consistent inverse relationship between chronological age and ADOS total scores (i.e., as
participant age increased, total ADOS score decreased) in participants under the age of 5 years,
and a direct positive correlation between age and score (i.e., as participant age increased, total
ADOS score increased) in participants age 5 years and older. As such, authors recommended
splitting Module 2 into two algorithms: Module 2 Younger than 5, and Module 2 Greater than or
Equal to 5, to correct for the effect of age on performance. Participants’ scores on Module 3 did
not appear to be highly correlated to any demographic variables so no division of module was
required (Gotham et al., 2007).
Structural validity. Exploratory factor analysis was completed on each of the five cells to
examine the structural validity of the items included in the Revised Scoring Algorithm. Across
the cells, a two-factor model was retained for interpretation. All items loaded saliently (i.e., >
.30) on one of the two factors across the five cells, and factors were significantly positively
correlated. Confirmatory factor analysis was used to determine if a 2-factor model fit the data
better than a 1-factor model. Authors reported that the Comparative Fit Index (CFI) values for
the 2-factor model ranged from .94 to .97 (CFI values greater than .90 indicate a good fit;
Skrondal & Rabe-Hesketh, 2004), and the Root Mean Square Error Approximation (RMSEA)
values ranged from .08 to .09, suggesting an adequate model fit. Gotham et al. (2007) reported
that the 2-factor model produced a “substantially better fit than the 1-factor model” (p. 618),
21
although no corroborating data were presented. Thus, a 2-factor model was accepted by the
authors. Factors were labeled Social Affective (SA) domain and Restricted-Repetitive Behavior
(RRB) domain, representing the items that loaded onto each of the factors (Gotham et al.; see
Table B3 for a list of items included by factor for each of the modules).
Evidence of diagnostic accuracy. ROC curve analysis was used to estimate the sensitivity
and specificity of the original scoring algorithm, the revised SA + RRB algorithm, and the
revised SA factor only in diagnostic determination. Classifications from each of the three
algorithms were compared to the clinical diagnosis provided for each participant following the
completion of the evaluation process (see Table 1 for sensitivity and specificity values by
algorithm).
In general, the Revised Scoring Algorithms retained the high levels of sensitivity
demonstrated by the Original Scoring Algorithms across each of the 5 modules. Increases in
sensitivity were observed when differentiating an ASD from a non-spectrum disorder in young
children with the revised algorithm. In addition, inclusion of the RRB factor improved the
predictive validity of the ADOS-G when classifying individuals with ASDs from those with non-
spectrum disorders (Gotham et al., 2007).
Given the results, Gotham et al. (2007) concluded that the revised scoring algorithm is a
useful option when interpreting an individual’s performance on the ADOS-G. In addition to the
increased sensitivity and predictive validity observed with some sub-groups of participants in the
sample, the revision of the algorithm into developmental cells also helps to lessen the effects of
verbal ability and age on participant performance and makes the items included in the scoring
algorithm more consistent across modules. However, the authors cautioned that more research is
needed given the study’s limitations (which included a small number of participants in the
22
sample without an Autistic Disorder or Autism Spectrum Disorder diagnosis, and the
interdependence of ADOS-G classification and resulting clinical diagnosis).
23
Table 1
Sensitivity and Specificity of Original and Revised Scoring Algorithms by Research Study
Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum Disorder
Original
Algorithm SA + RRB
Revised SA Only Revised
Original Algorithm
SA + RRB Revised
SA Only Revised
Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.
Gotham, Risi, Pickles, & Lord (2007)
1, No Words
.99 .55 .96 .72 .93 .72 .92 .37 .89 .49 .85 .43
1, Some Words
.88 .96 .97 .91 .91 .93 .67 .84 .77 .82 .75 .79
2, < 5 .97 .93 .98 .93 .95 .97 .76 .70 .84 .77 .80 .63
2, > 5 .96 .97 .98 .90 .92 .97 .86 .77 .83 .83 .72 .77
3 .86 .89 .91 .84 .85 .87 .68 .77 .72 .76 .61 .78
Gotham et al. (2008)
1, No Words
.89 .78 .86 .80 NA NA NA NA NA NA NA NA
1, Some Words
.73 .94 .89 .91 NA NA 1.0 .80 .95 .75 NA NA
(table continues)
24
Table 1 (continued) Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum
Disorder
Original Algorithm
SA + RRB Revised
SA Only Revised
Original Algorithm
SA + RRB Revised
SA Only Revised
Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.
Gotham et al. (2008)
2, < 5 .85 1.0 .94 1.0 NA NA .88 1.0 .65 1.0 NA NA
2, > 5 NA NA NA NA NA NA NA NA NA NA NA NA
3 .72 .96 .82 .92 NA NA .49 .89 .60 .88 NA NA
Gray, Tonge, & Sweeney (2008)
1, No Words
NA NA .98 .82 .98 .73 NA NA .92 .86 .94 .86
1, Some Words
NA NA .89 .86 .88 .89 NA NA .78 .92 .76 .96
de Bildt et al. (2009)
1, No Words
NA NA NA NA NA NA NA NA NA NA NA NA
1, Some Words
.82 .88 .92 .71 .90 .71 .86 .63 .86 .63 .86 .54
2, < 5 NA NA NA NA NA NA NA NA NA NA NA NA (table continues)
25
Table 1 (continued)
Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum Disorder
Original
Algorithm SA + RRB
Revised SA Only Revised
Original Algorithm
SA + RRB Revised
SA Only Revised
Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.
de Bildt et al. (2009)
2, > 5 .63 .92 .88 .76 .80 .82 .56 .64 .53 .62 .62 .48
3 .73 .84 .87 .73 .82 .73 .64 .67 .68 .63 .70 .66
Oosterling et al. (2010)
1, No Words
.90 .90 .83 .80 .81 .80 .88 .60 .76 .70 .82 .80
1, Some Words
.52 1.0 .69 .98 .62 .98 .43 .89 .50 .86 .58 .81
2, < 5 .44 1.0 .71 .93 .62 .97 .37 .97 .41 .83 .54 .73
2, > 5 .21 1.0 .57 .90 .50 .98 .45 .93 .64 .85 .73 .83
Molloy et al. (2011)
1, No Words
.91 .65 .82 .65 NA NA .93 .29 .93 .29 NA NA
1, Some Words
.78 ,81 .93 .69 NA NA .94 .56 1.0 .46 NA NA
(table continues)
26
Table 1 (continued)
Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum Disorder
Original
Algorithm SA + RRB
Revised SA Only Revised
Original Algorithm
SA + RRB Revised
SA Only Revised
Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.
Molloy et al. (2011)
2, < 5 .67 .92 .72 .81 NA NA .75 .81 .72 .60 NA NA
2, > 5 .72 .95 .94 .65 NA NA .79 .68 .85 .60 NA NA
3 .77 .72 .92 .55 NA NA .87 .48 .87 .35 NA NA
27
Additional reliability and validity evidence for the Revised Scoring Algorithm. Since the
publication of the Gotham et al.’s (2007) article, six studies have been conducted to further
investigate the utility of the revised scoring algorithm when classifying an examinee’s
performance on the ADOS-G. Gotham et al. also conducted a second research project in 2008 in
order to attempt to replicate the results of their 2007 study with an independent data set.
In the 2008 study, participants (N= 1259) ranged in age from 18 to 192 months and were
recruited from 11 different sites across the U.S. Similar to the original sample, the majority of
participants (76%) had clinical diagnoses of Autistic Disorder. Consistent with the methods of
the original sample, the current sample was divided into five developmental cells (Module 1 No
Words, Module 1 Some Words, Module 2 Younger than 5, Module 2 Greater than or Equal To 5,
and Module 3). Revised algorithm scores were generated from item scores, and the sensitivity
and specificity of the original and revised algorithms were calculated by developmental cell
using ROC curves. The factor structure of the items included in the revised scoring algorithm
was also investigated by developmental cell and compared to the 2007 sample.
Due to the extremely small number of data points (N=17) within the Module 2 Greater
than or Equal to 5 cell, analyses weren’t conducted on this developmental cell. For the cells
included within the analyses, authors reported the results indicated that the 2-factor model
structure proposed for the revised scoring algorithm items within the 2007 study (Gotham et al.)
also satisfactorily fit the current data. However, negative factor loadings were observed for 2
items within the SA factor and for all items within the RRB factor in the Module 2 Younger than
5 developmental cell (Gotham et al., 2008), calling the suitability of the 2-factor structure into
question. Gotham et al. also reported that a CFA confirmed the satisfactory replication of the 2-
factor model across developmental cells within the current sample, although goodness-of-fit
28
information was not provided. Sensitivity and specificity values are reported in Table 1. In
general, results indicated that the predictive validity (both the sensitivity and specificity)
improved when the revised algorithm was utilized with the independent sample (Gotham et al.).
The biggest improvements in sensitivity between the original and the revised algorithms were
observed within the Module 1 Some Words cell when differentiating between Autistic Disorder
and a non-spectrum disorder, and within Module 3 when differentiating between an ASD and
non-spectrum disorder. Despite some challenges in the replication of the 2-factor model of the
revised scoring algorithm and sample size limitations that precluded analysis on one of the five
developmental cells, the authors concluded that the revised algorithms “better represent observed
diagnostic features through new domains, increase comparability between modules in algorithm
item content and number, and improve ADOS predictive validity for autism compared to
previous algorithms” (Gotham et al, p. 650).
Also in 2008, Gray, Tonge, and Sweeney conducted a research study designed to evaluate
the diagnostic validity of the ADOS and the Autism Diagnostic Interview-Revised (ADI-R;
Rutter, LeCouteur, & Lord, 2003) in a sample of young children with and without autism.
Although not a primary outcome of the study, Gray et al. examined the sensitivity, specificity,
positive predictive power, and negative predictive power of diagnostic classifications made with
the original and the revised scoring algorithms to determine if significant differences existed
between the two methods.
Australian children (N = 209; ages 20-55 months) served as participants for this study.
All participants had been referred for assessment at an early childhood health agency due to the
suspicion of autism or concerns regarding other developmental problems. All participants were
administered either Module 1 or Module 2 the ADOS-G as a part of a developmental assessment
29
(also consisting of an assessment of cognitive ability, comprehensive language assessment, and
parent interview with the ADI-R). Following completion of the assessment, all data was
reviewed to arrive at a clinical diagnosis based on DSM-IV-TR criteria (American Psychiatric
Association, 2000). ADOS-G classifications, obtained using both the original and revised scoring
algorithms, were compared to the final clinical diagnosis to investigate the sensitivity and
specificity of the instrument.
Gray et al. (2008) reported that, when using the original scoring algorithm, out of the 209
participants, 18 were inappropriately classified on the ADOS-G as not having Autistic Disorder
or a less severe Autism Spectrum Disorder when they did in fact meet DSM-IV-TR criteria to
warrant a clinical diagnosis of Autistic Disorder or another PDD. In addition, 10 participants
with a final clinical diagnosis of a non-spectrum disorder were inappropriately classified as
having Autistic Disorder or an ASD on the ADOS-G when using the original scoring algorithm.
Similar to Gotham et al. (2007), Gray et al. compared the utility of the revised scoring algorithm
by examinee’s classifications derived from their combined scores on the Social Affective (SA)
and Restricted-Repetitive (RRB) domains, as well as from their scores on the SA domain only
(see Table 1 for sensitivity and specificity values for the original and revised scoring algorithm).
Authors reported that there was a general improvement in sensitivity and efficiency of
classification with the revised algorithms (both SA+RRB and SA only), but lower specificity
was observed across the sample. Lower sensitivity was observed with the revised algorithms,
however, when classifying participants in Module 1 Some Words with an ASD compared to a
non-spectrum disorder. The positive predictive power of the original scoring algorithm (.91 - .96)
and the revised algorithm (.88 - .98) did not differ significantly. However, the revised algorithm
(.67 - .90) demonstrated greater negative predictive power than did the original scoring algorithm
30
(.64 - .81). Gray et al. also indicated that, across modules, no significant difference in sensitivity
was observed between making classification determinations using the SA+RRB algorithm and
the SA only algorithm. However, with non-verbal children (i.e., participants in the Module 1 No
Words sample), using the SA+RRB revised algorithm as the classification determinant led to the
highest diagnostic accuracy. Based on their mixed results and the lack of other independent
examinations of the efficacy of the revised scoring algorithms, and the interdependency of
ADOS-G and ADI-R scores in consensus clinical diagnoses, the authors concluded that future
research on the revised diagnostic algorithms is necessary.
Overton, Fielding, and de Alba (2008) also set out to compare the differences in
diagnostic classification determinations of the original and the revised scoring algorithms to
determine if the revised algorithm would decrease the incidence of false positive and false
negative ADOS-G classifications with a small sample of students referred for psychoeducational
diagnostic evaluations.
Twenty-six Hispanic children (ranging in age from 20 to 192 months), referred for
school-based evaluations due to concerns regarding the possibility of neurodevelopmental or
psychological disorders, served as participants in this study. All participants were administered
either Module 1, Module 2, or Module 3 of the ADOS-G, depending on chronological age and
expressive-language level. Participants’ performance on the ADOS-G was initially scored using
the original scoring algorithm and was compared to the individual’s overall clinical diagnosis
provided at the conclusion of the evaluation process. At a later date, the same participants’
performance was rescored and classified using the revised scoring algorithm. The accuracy of
participants’ ADOS-G classification made with the revised algorithm was then compared to the
accuracy of classification made using the original algorithm.
31
When comparing ADOS-G classifications obtained using the original scoring algorithm
with concluding clinical diagnoses, four false positive and one false negative classification were
observed. When the revised scoring algorithm was applied, only one participant’s diagnostic
classification appropriately changed (from Autistic Disorder to Autism Spectrum Disorder). All
other participants’ classifications remained consistent across scoring algorithms (Overton et al.,
2008). Although several limitations of the study, most notably sample size, were noted by the
authors, these results cast some doubt on the superiority of the revised scoring algorithm over the
original scoring algorithm currently in use.
A comparison of the original and revised scoring algorithms was also conducted by de
Bildt et al. (2009). Specifically, these researchers conducted a study to determine how well the
classification determinations derived from the revised scoring algorithms (SA+RRB and SA
only) contribute to a clinical diagnosis of an Autism Spectrum Disorder or non-spectrum
disorder, when compared to the contribution of the original scoring algorithm.
Five-hundred and fifty-eight Dutch children, ranging in age from 13 to 198 months,
served as participants in this study. The majority of participants had a clinical diagnosis of
Autistic Disorder (35%) or unspecified Autism Spectrum Disorder (40%). Participants were
administered Module 1, Module 2, or Module 3 of the ADOS-G as part of an evaluation for
childhood psychiatric problems or when serving as a research participant in an epidemiological
study of Autism Spectrum Disorders in children with mental retardation. Each participant’s
performance on the ADOS-G was scored and classified using the original algorithm, SA only
revised algorithm, and SA+RRB revised algorithm. The sensitivity, specificity, and efficiency
(i.e., the percentage of cases correctly classified; an estimate of the balance between sensitivity
and specificity; de Bildt et al., 2009) of each participant’s ADOS-G classifications compared to
32
their clinical diagnosis were determined across the three scoring algorithms. Logistic regression
was also conducted to determine the relative contribution of each participant’s ADOS-G
classification by scoring algorithm to their clinical diagnosis (de Bildt et al.).
Results of the study varied across modules (Table 1). In general, however, use of the
revised scoring algorithms resulted in increased sensitivity and decreased specificity of
diagnostic classifications compared to the original scoring algorithm. The efficiency of each of
the scoring algorithms was determined to be comparable in Modules 1 and 2. Both of the revised
scoring algorithms produced higher classification efficiencies than the original scoring algorithm
in Module 3. Data from the regression analyses indicated that the diagnostic classifications of the
original and the SA only revised scoring algorithms contribute approximately equal variance to
clinical classification across modules. However, in Modules 2 and 3, participant’s scores on the
RRB factor were determined to contribute additional variance over and above that accounted for
by the classification made by either the original or the SA only revised scoring algorithms (de
Bildt et al., 2009).
Based on their results, the authors formed several conclusions. First de Bildt et al. (2009)
asserted that utilizing the revised scoring algorithms to make classification determinations helps
to improve the sensitivity and specificity of those classifications in Modules 2 and 3 without
compromising the balance between the two. Consistent with the findings of Lord et al. (2006),
authors also noted that the addition of RRB into the diagnostic scoring algorithm increases the
discriminative power of the ADOS-G with older and higher functioning individuals. Therefore,
de Bildt et al. indicated that the revised scoring algorithm provides advantages over the original
scoring algorithm.
33
Similarly to Gotham et al. (2008), Oosterling et al. (2010) set out to replicate the results
of the Gotham et al. (2007) initial investigation of the revised scoring algorithm with an
independent sample. The authors aimed to examine whether or not the revised algorithms
improve the overall diagnostic validity of the ADOS-G.
Five-hundred and thirty-two cases, obtained from 426 Dutch participants, were included
in the analyses. Participants ranged in age from 15 to 144 months, and had the following clinical
diagnoses: Autistic Disorder (40%), PDD NOS (25%), Asperger’s Disorder (2%), or non-
spectrum developmental delays (30%). Three percent of participants did not possess a clinical
diagnosis. Each participant was administered either Module 1 or Module 2 of the ADOS-G
(based on their age and expressive-language ability) as a part of a comprehensive diagnostic
evaluation. Participants’ ADOS-G performance was scored and classified using the original and
revised (SA only and SA + RRB) scoring algorithms. Sensitivity, specificity, correct
classification rate, positive predictive value, and negative predictive value were calculated for the
original and the revised algorithms in relation to each participant’s clinical diagnosis to
determine diagnostic accuracy. A confirmatory factor analysis (CFA) was also completed to
determine the goodness-of-fit of the factor structure of the revised algorithm.
Sensitivity and specificity values for the original and revised scoring algorithms (SA only
and SA+RRB) are presented in Table 1. In general, results indicate that use of the revised scoring
algorithms produces a better balance between the sensitivity and specificity of diagnostic
classifications than observed when using the original scoring algorithm. However, the sensitivity
values obtained from determinations made using all three scoring algorithms were unacceptable,
according to the authors (Oosterling et al., 2010). In addition, specificity values obtained from
the revised scoring algorithms were generally higher than those obtained from the original
34
algorithm. Positive predictive values for the original scoring algorithm (.98 – 1.0 when
differentiating between Autistic Disorder and a non-spectrum disorder; .65 - .94 when
differentiating between non-Autism ASD and a non-spectrum disorder) were slightly higher than
those for the revised algorithm (.67 - .96 when differentiating between Autistic Disorder and a
non-spectrum disorder; .65 - .82 when differentiating between non-Autism ASD and a non-
spectrum disorder). Negative predictive values for the original algorithm (.61 - .78 when
differentiating between Autistic Disorder and a non-spectrum disorder; .53 - .76 when
differentiating between non-Autism ASD and a non-spectrum disorder) were also slightly higher
than those for the revised algorithm (.47 - .86 when differentiating between Autistic Disorder and
a non-spectrum disorder; .51 - .78 when differentiating between non-Autism ASD and a non-
spectrum disorder). Results of the CFA indicated that a 2-factor model appropriately fit the data
in Module 1 No Words, Module 1 Some Words, and Module 2 Younger than 5 (CFI = .96-1.0,
RMSEA = .04-.08), but not in Module 2 Greater than or Equal to 5 (CFI = .87, RMSEA = .14).
Authors also reported that the 2-factor model provided better fit than a proposed 1-factor model,
although no specific goodness-of-fit data was provided (Oosterling et al.).
In the discussion of their results, Oosterling et al. (2010) noted that the sensitivity values
obtained from the current sample were significantly lower, and the specificity values were higher
than the values that have been reported in other studies measuring the diagnostic accuracy of the
revised scoring algorithms. Although all sensitivity values were in the unacceptable range, values
did improve when using the revised algorithms (SA only for ASD classifications, SA+RRB for
Autistic Disorder classifications) for classification determination compared to the original
scoring algorithm. In general, the authors concluded that the revised algorithms offer better
diagnostic validity than the original scoring algorithms. Based on the current data, Oosterling et
35
al. also indicated that the SA only scoring algorithm results in the greatest diagnostic accuracy
when used to classify the performance of young, low-functioning individuals and older, high-
functioning individuals.
Most recently, Molloy, Murray, Akers, Mitchell, and Manning-Courtney (2011)
examined the sensitivity and specificity of the ADOS-G as it is typically used in a clinical setting
(i.e., as part of an initial diagnostic evaluation to confirm or rule out an Autism Spectrum
Disorder). ADOS-G data from 584 diagnostic evaluations were included in the analyses.
Participants, ranging in age from 26 to 198 months, were administered the ADOS-G in
conjunction with other assessment instruments. Twenty-six percent of participants had clinical
diagnoses of Autistic Disorder, 32% with non-autism ASD, and 44% with non-spectrum
disorders. Participants’ item-scores on the ADOS-G were used to calculate domain scores for
both the original and the revised scoring algorithms, which were then used to make diagnostic
classifications. Following the completion and review of the diagnostic evaluation, participants
were assigned a clinical diagnosis by a psychologist or a developmental pediatrician. Finally,
each participant’s diagnostic classifications from both the original and the revised scoring
algorithm were compared to their clinical diagnosis in order to calculate sensitivity and
specificity.
Sensitivity and specificity values from the original and revised scoring algorithms can be
found in Table 1. When used to classify individuals with Autistic Disorder compared to those not
on the spectrum, the revised scoring algorithm generally produced higher levels of sensitivity
and lower levels of specificity, compared to the original scoring algorithm, across the five
developmental cells. Few differences were observed in the comparison of the sensitivity and
36
specificity values of the original and revised scoring algorithms when used to classify individuals
with ASD compared to those with non-spectrum disorders (Molloy et al., 2011).
Malloy et al. (2011) also compared current results with the sensitivity and specificity
values of the original and revised scoring algorithms reported in the original study (Gotham et
al., 2007) and by de Bildt et al. (2009). In general, sensitivity and specificity values obtained by
Molloy et al. (2011) were lower than those reported by Gotham et al. and de Bildt et al. across all
of the developmental cells. Although Gotham et al. and de Bildt et al. concluded that the revised
scoring algorithm is superior to the original, Molloy et al. reported that using the revised scoring
algorithm did not improve the predictive value of the ADOS-G. The authors hypothesized that
differences in examiner scoring, clinical decision-making, and sample composition (e.g., there
was a larger percentage of participants without a spectrum disorder in the current study as
compared to the samples analyzed by Gotham et al. and de Bildt et al.) may have contributed to
the differences in diagnostic accuracy reported across studies.
Rationale for Present Study
The ADOS-G is widely used in the evaluation and diagnosis of Autism Spectrum
Disorders across clinical, research, and educational settings. However, little research on the
technical adequacy of the instrument has been conducted to date. The structural validity of the
original scoring algorithm has yet to be replicated, and few independent examinations of the
ADOS-G scores’ relations with other measures have been conducted. Thus, an independent
examination of the technical adequacy of the ADOS-G is necessary and timely.
Further, additional research is needed to determine if the revised diagnostic algorithm
results in greater diagnostic accuracy than the original algorithm. Current evidence regarding the
utility of the revised algorithm is mixed. Although Gotham et al. (2007, 2008) concluded that use
37
of the revised scoring algorithm increases levels of sensitivity and specificity, other studies have
reported no differences in diagnostic accuracy across scoring algorithms (Overton et al., 2008) or
increases in sensitivity coupled with decreases in specificity (Gray et al., 2008; Molloy et al.,
2011). Comparisons between the positive predictive power and negative predictive power across
algorithms are also missing from much of the available research. In addition, much of the
independent research that has been conducted on the revised scoring algorithm has utilized
samples of international participants (i.e., Dutch and Australian children) or homogeneous
samples (i.e., Hispanic children) within the U.S., calling the generalizability of study results to a
heterogeneous American population into question. Further, no large sample study has been
conducted that examines the utility of the revised scoring algorithm with a school-based sample.
In addition, a consistent limitation identified by several studies (Gotham et al., 2007;
Gotham et al., 2008; Gray et al., 2008) is the interdependency of scores from the ADOS-G in the
determination of diagnostic accuracy. Specifically, to determine the overall diagnostic accuracy
of the instrument, the classification accuracy of ADOS-G scores have been compared to
participants’ end clinical diagnoses, which were made, in part, based upon participants’ scores on
the ADOS-G. Further research is needed to determine the classification accuracy of scores from
the ADOS-G when compared to clinical diagnoses made without information regarding a
participant’s performance on the ADOS-G.
Purpose and Hypotheses
The primary purpose of this dissertation was to examine the validity and diagnostic
accuracy of ADOS-G scores for children and adolescents, ranging in age from 2 years through
17 years. Specifically, three types of evidence were examined as part of this study: structural
38
validity, relations with other variables, and accuracy of diagnosis. Based on these objectives, the
following research questions and hypotheses were tested in the study.
1a. Does a one-factor model best represent the internal structure of items included in the
ADOS-G Original Scoring Algorithm for each of the modules under investigation?
Hypothesis 1a. Consistent with the authors’ original model (Lord et al., 1999),
items included within the Original Scoring Algorithm of the ADOS-G will reflect
a uni-dimensional construct across modules.
1b. Does a two-factor model best represent the internal structure of items included in the
ADOS-G Revised Scoring Algorithm for each module under investigation?
Hypothesis 1b. Consistent with the findings of Gotham at al. ( 2007) items
included within the Revised Scoring Algorithm of the ADOS-G will reflect two
constructs across modules.
2. Across modules, do total scores on the ADOS-G (Original and Revised Scoring
Algorithms) demonstrate moderate to strong relationships with other measures of
autistic behavior and weaker relationships with measures of other behavioral
characteristics?
Hypothesis 2. Scores on the ADOS-G will demonstrate moderate to strong
relationships with scores from other measures of autistic behavior and weaker
relationships (i.e., weak to moderate correlations) with other measures of
behavioral functioning.
3. Across modules, does use of the Revised Scoring Algorithm result in greater
diagnostic accuracy of ADOS-G total scores than use of the Original Scoring
Algorithm?
39
Hypothesis 3. Consistent with the findings of Gotham et al. (2007), it is
hypothesized that the revised diagnostic algorithm on the ADOS-G will result in
greater diagnostic accuracy (i.e., correctly identify individuals who are on the
spectrum from those individuals who are not and, for those on the spectrum,
correctly differentiate between Autistic Disorder and non-autism ASD) than the
Original Scoring Algorithm.
4. Will there be differences in estimates of diagnostic accuracy made when comparing
ADOS-G classifications to clinical decisions made with and without scores from the
ADOS-G?
Hypothesis 4. Greater diagnostic accuracy of ADOS-G scores will be observed
when scores are compared to clinical diagnoses made with information from the
ADOS-G as compared to those made without information regarding a
participant’s performance on the ADOS-G.
40
Chapter 2. Method
Participants
An extant database was utilized to answer the research questions. This database included
582 children who were enrolled in a large, suburban public school district in the southern U.S.
and referred for a school-based psychoeducational evaluation. Because a revised scoring
algorithm was not proposed for Module 4 of the ADOS-G, only participants who were
administered Module 1, Module 2, or Module 3 were included in the analyses, resulting in a final
sample size of 462. At the time data were collected, participants ranged in age from 2-years, 10-
months to 17-years, 9-months (M = 7-years, 3-months). All participants were either previously
diagnosed with, or suspected of having, an Autism Spectrum Disorder at the time of evaluation.
Demographic information for the participants is presented in Table 2. One-hundred of the 462
participants were randomly selected for participation in the independent clinical diagnosis
diagnostic accuracy examination (Hypothesis 4). Participants in this analysis also ranged in age
from 34 to 213 months (M = 93 months). Demographic information for participants included in
the independent clinical diagnosis diagnostic accuracy examination is also presented in Table 2.
Complete ADOS-G item data was available for all 462 participants on one of the ADOS-
G modules. In accordance with the practices of Gotham et al. (2007), participants who were
administered Module 1 were divided into two groups for Revised Scoring Algorithm
comparisons based on expressive language ability (i.e., those who are nonverbal, and those with
some language production)1. Similarly, participants who were administered Module 2 were also
were divided into two groups for Revised Scoring Algorithm comparisons based on age (i.e.,
those who were younger than 5 years at the time of the administration, and those who were 5
years of age or older at the time of the administration)2.
41
Table 2 Demographic Characteristics of Total Sample (N = 462) and Independent Clinical Diagnosis Subsample (n = 100) Total Sample Independent
Diagnosis
% %
Gender
Male 85 83
Female 15 17
Unavailable <1 N/A
Ethnicity
Caucasian 59 61
Black 10 9
Hispanic 14 12
Asian 9 9
Other 2 3
Unavailable 6 6
Grade At Time of Evaluation
Early Childhood (Not Enrolled in School District) 15 13
Public Preschool in Referring School District 13 11
Kindergarten 17 15
Early Elementary (Grades 1-3) 33 31
Upper Elementary (Grades 4-5) 15 24
(table continues)
42
Table 2 (continued)
Total Sample Independent Diagnosis
% %
Grade At Time of Evaluation
Middle School (Grades 6-8) 7 5
High School (Grades 9-12) <1 1
Unavailable <1 N/A
Special Education Eligibility
Autism Only 27 16
Speech Impaired Only 9 9
Mental Retardation Only <1 1
Emotional Disturbance Only 2 4
Other Health Impaired Only 4 8
Specific Learning Disability Only <1 1
Autism and Speech Impaired 38 39
Mental Retardation and Speech Impaired 1 3
Other Combination of Eligibilities 13 18
Autism, Mental Retardation, and Speech Impaired 3 0
No Eligibility 2 1
Unavailable <1 N/A
Ending Clinical Diagnosis
Autistic Disorder 26 20
(table continues)
43
Table 2 (continued)
Total Sample Independent Diagnosis
% %
Ending Clinical Diagnosis
Asperger’s Disorder 13 16
Pervasive Developmental Disorder Not Otherwise Specified
26 23
Attention-Deficit/Hyperactivity Disorder 6 14
Mood Disorder 2 3
Other Disorder 7 4
No Disability 13 20
Unavailable 8 N/A
ADOS-G Module
Module 1 18 15
No Words Revised Scoring Algorithm 4 1
Some Words Revised Scoring Algorithm 14 14
Module 2 26 27
Less Than 5 Years of Age 10 8
Greater Than or Equal To 5 Years of Age 16 19
Module 3 56 58
Note: ADOS-G = Autism Diagnostic Observation Schedule-Generic
44
Measures
Several measures were administered to participants and their parents/teachers as part of a
school-based multidisciplinary team evaluation process.
Autism Diagnostic Observation Schedule – Generic (ADOS-G). The ADOS-G is a
semi-structured, standardized assessment of communication, social interaction, and play or
imaginative use of materials for individuals who are suspected of having autism or another
pervasive developmental disorder. The ADOS-G is hypothesized to assess skills in four domains.
Communication assesses characteristics such as vocalization, idiosyncratic use of words or
phrases, pointing, and gestures. Reciprocal Social Interaction measures behaviors such as eye
contact, facial expressions, shared enjoyment, showing, spontaneous initiation of joint attention,
response to joint attention, and quality of social overtures. Play measures functional play with
objects and imaginative play, and Stereotyped Behaviors and Restricted Interests tap
characteristics such as unusual sensory interest in play materials (e.g., sniffing), complex hand
and finger mannerisms, and repetitive interests or stereotyped behaviors. Communication and
Reciprocal Social Interaction are combined to create a Communication + Social Interaction
Total scale. Cut-off scores for autism and autism spectrum are applied to each scale in
determining the possible presence or lack thereof of an Autism Spectrum Disorder.
The ADOS-G is scored using a diagnostic algorithm that allows for the classification of
examinees into two categories: those who have the social and communication deficits consistent
with a diagnosis of Autism or an Autism Spectrum Disorder and those who do not (Lord et al.,
1999). In order to arrive at this classification, ratings are assigned by examiners for each of the
dimensions of functioning assessed throughout the ADOS administration (see Table 2 in
Appendix B for more information). Examiners score each dimension of functioning using either
45
a 3-point scale (0 - 2) or a 4-point scale (0 - 3), where a score of 0 represents typical functioning
for the participant’s age and developmental level, and a score of 2 or 3 represents highly atypical
functioning (Lord et al). Next, all item scores of 3 are converted to scores of 2. Finally, the
examinee’s performance on selected items from the Communication and Reciprocal Social
Interaction subscales are summed and then compared against cut-scores for Autistic Disorder
and Autism Spectrum Disorder for each of the subscales and for a total scale score (obtained by
adding a participant’s scores on the Communication subscale with his/her scores on the
Reciprocal Social Interaction subscale). The communication and social items included in the
scoring algorithm vary across modules and are identified in Table B2. Although the ADOS-G
measures an examinee’s engagement in restricted, stereotyped, and/or repetitive behaviors and
imagination/creativity, the ADOS-G’s Original Scoring Algorithm does not utilize these items in
classification determination (Lord et al.).
Gilliam Autism Rating Scale, Second Edition (GARS-2).The GARS-S was selected to
provide convergent validity evidence for scores on the ADOS-G. The GARS-2 (Gilliam, 2006) is
a screening instrument used for the assessment of behavior problems that may be indicative of
autism in individuals ages 3 to 22. Although only one form exists, it is designed for use with
parents, teachers, and/or other caregivers who have had regular, sustained contact with the child
being assessed for at least two weeks time. The GARS is composed of 42 items, which are
divided into three subscales: Stereotyped Behaviors, Communication, and Social Interaction.
Each item is scored on a 4-point scale of frequency (0 = Never Observed, 1 = Seldom Observed,
2 = Sometimes Observed, and 3 = Frequently Observed). Items are summed across subscales,
resulting in raw scores that are converted to standard scores (M = 10, SD = 3). Standard scores
46
from each of the three subscales are then summed and converted into a full-scale Autism Index,
which has a mean of 100 and a standard deviation of 15.
The Stereotyped Behaviors subscale is composed of 14 items that assess the frequency
with which a child exhibits stereotyped behaviors (e.g., hand/finger flapping or flicking and
spinning), motility disorders (e.g., prancing, toe-walking, and making lunging/darting
movements), and other unique or atypical behaviors (e.g., smells/sniffs “unscented” objects,
vocal self-stimulation, and licks/tastes/attempts to eat inedible objects) (Gilliam, 2006).
The Communication subscale is composed of 14 items that assess the frequency with
which a child exhibits the verbal behaviors (e.g., echoes/repeats words and phrases, repeats
unintelligible sounds, and uses pronouns/I inappropriately) and nonverbal behaviors (e.g., looks
away/avoids looking at a speaker when name is called and uses gestures instead of speech/sign to
obtain objects) that are symptomatic of autism (Gilliam, 2006).
The Social Interaction subscale is composed of 14 items that evaluate the child’s ability
to relate appropriately to people, objects, and events within his or her environment (Gilliam,
2006). Items assess the frequency with which a child responds atypically to typical social
situations (e.g., looks away when someone looks at him or her, looks unhappy when praised, and
looks through people), uses objects in an atypical fashion (e.g., lines up objects in a precise
fashion and becomes upset when the order is disturbed, and uses toys inappropriately), and
responds to his or her environment in an atypical way (e.g., behaves in an unreasonable fearful
manner, and does certain things repetitively or ritualistically).
Reliability and validity evidence provided in the GARS-2 Examiner’s Manual indicates
that it is a technically adequate instrument for the screening and diagnosis of individuals on the
autism spectrum (Gilliam, 2006). Adequate internal consistency (α > .80; Salvia & Ysseldyke,
47
2004) was reported for each of the three subscales and for the total scale. Test-retest reliability
coefficients for the three subscales (r = .70 - .90) and for the total scale (r = .88) demonstrate the
stability of respondents’ ratings over time on the GARS-2. Subscale and total scale ratings on
the GARS-2 were also compared to the total scale ratings on The Autism Behavior Checklist
(ABC; Krug, Arick, & Almond, 1993) and determined to demonstrate moderate to strong
concurrent relationships (r = .58 - .71).
Although not discussed in the examiner’s manual, the structural validity of the GARS-2
standardization sample was examined by Pandolfi, Magyar, and Dill (2010). Exploratory factor
analysis was conducted on the item data. Inconsistent with the author’s (Gilliam, 2006) three
conceptually-derived subscales, a four-factor solution provided the best overall model fit.
Confirmatory factor analysis confirmed the superiority of the four-factor model (χ2 = 3,039.59, p
< .001; RMSEA = .08; CFI = .91) over the three factor model (χ2 = 4,861.33, p < .001; RMSEA
= .10; CFI = .84). However, authors (Pandolfi et al.) identified several limitations to their study,
including a smaller than preferred sample size (N = 496), the failure to independently confirm
participants’ ASD diagnoses, and the failure to include non-verbal participants in the analyses.
Given their results, the authors concluded that the GARS-2 subscales should be interpreted with
extreme caution because each subscale is possibly measuring multiple constructs. However, they
also indicated that additional research is needed to further evaluate the clinical utility of the
GARS-2.
Behavior Assessment System for Children, Second Edition (BASC-2). Several
subscales from the Behavior Assessment System for Children, Second Edition (BASC-2) were
used to provide convergent and discriminant validity evidence for scores on the ADOS-G. The
BASC-2 (Reynolds & Kamphaus, 2004) is a broadband behavioral rating scale that assesses the
48
domains of Externalizing Problems, Internalizing Problems, Adaptive Skills, and overall
Behavioral Symptoms in children and young adults aged 2 through 25 years. Each domain
consists of several subscales that assess specific classes of behavior within the larger domain.
Parent and teacher rating scales, and self-report of personality forms exist. Across forms,
respondents rate each item on a 4-point scale of frequency (0 = Never, 1 = Sometimes, 2 = Often,
3 = Almost Always). Items are summed across scales and broad domains, and raw scores are
converted to T scores (M = 50, SD = 10). Parent and teacher ratings on the following subscales
will be included in the analysis.
The Atypicality scale measure’s a child’s tendency to behave in ways that are considered
strange or odd by observers (Reynolds & Kamphaus, 2004). Items primarily focus on the child’s
awareness of his or her typical surroundings and apparent connection to his or her environment.
The scale also includes items that assess the frequency with which the child exhibits behaviors
that are consistent with autism symptomology, such as perseverative thought and behavior, social
disconnectedness, and engagement in stereotyped and repetitive motor mannerisms. According
to test authors (Reynolds & Kamphaus), T-scores in the At-Risk or Clinically Significant range
on the Atypicality scale may be suggestive of a developmental delay or Autism Spectrum
Disorder. Further, in the validation sample, young children’s scores on the Atypicality scale
demonstrated a moderate to strong concurrent relationship (r = .42 for parent reports, .77 for
teacher reports) with scores on the Pervasive Developmental Problems scale on the Achenbach
System of Empirically Based Assessment Child Behavior Checklist (ASEBA CBC; Achenbach &
Rescorla, 2000).
The Withdrawal scale measures a child’s tendency to evade others in order to avoid
social contact, and his or her general level of interest in making contact with others in a social
49
setting (Reynolds & Kamphaus, 2004). Items on this scale assess the child’s general social
difficulties with peers and his or her engagement in or avoidance of group activities. According
to Reynolds and Kamphaus (2004), the Withdrawal scale assesses a “core symptom of autism”
(p. 63) and, as such, scores in the At-Risk or Clinically Significant range on this scale provide
support to consider the possibility of an ASD. Similar to the Atypicality scale, moderate
concurrent relationships were observed between parent and teacher ratings on the Withdrawal
scale and their ratings on the Pervasive Developmental Problems scale on ASEBA Child
Behavior Checklist (r = .49 for parent reports, .57 for teacher reports).
Participants’ scores on the Anxiety scale will be used to examine discriminant validity
evidence for scores on the ADOS-G. The Anxiety scale measures a child’s tendency to be
nervous, fearful, or worried about real or imagined problems. Items on this scale assess the
child’s level of perfectionism, education-related fears, and social worries.
In general, evidence from the validation sample indicates that the BASC-2 is a
technically adequate tool for measuring behavioral functioning in children and adolescents
(Reynolds & Kamphaus, 2004). Adequate internal consistency (α > .80; Salvia & Ysseldyke,
2004) has been reported for all scale composites on both the parent and teacher rating scales for
children and adolescents age 4-years and above. Test-retest reliability estimates on the Teacher
Rating Scales (TRS; r = .72 - .93 on the preschool form, r = .65 - .94 on the child form, and r =
.66 to .91 on the adolescent form) and the Parent Rating Scales (PRS; r = .66 - .88 on the
preschool form, r = .65 - .92 on the child form, and r = .72 to .92 on the adolescent form) reflect
an adequate to strong consistency of ratings over time. In addition, across parent and teacher
forms, scores on the Behavioral Symptoms Index (i.e., the composite score on the BASC-2 that
reflects the child’s overall level of problem behavior) demonstrate strong concurrent
50
relationships (r = .76 - .84) with the Total Problems composite score on the ASEBA Child
Behavior Checklist (Achenbach & Rescorla, 2000). Exploratory factor analyses conducted at the
subscale level also provide evidence of the structural validity of the BASC-2 Teacher and Parent
Rating Scales. Three and four-factor solutions were extracted using Principal Axis factoring and
Varimax rotation, and examined for suitability for both the TRS and PRS. Consistent with
theory, four-factor solutions presented better model fit and were retained for the TRS and PRS
(Reynolds & Kamphaus).
Procedure
Participants received a multidisciplinary Autism Team evaluation by school district
personnel due to concerns regarding their social functioning, communication abilities, and/or
sensorimotor functioning. Each multidisciplinary Autism Team is composed of a Licensed
Specialist in School Psychology, Educational Diagnostician, Speech and Language Pathologist,
and Occupational Therapist, all of whom have specialized training in the assessment and
diagnosis of Autism Spectrum Disorders. In addition to general training on ASDs, all team
members have completed the standardized training on the administration and scoring of the
Autism Diagnostic Observation Schedule-Generic, which was facilitated by an ADOS-certified
trainer. At the completion of the ADOS-G training, all team members were required to reliably
score a video-taped ADOS-G administration to demonstrate their competence with the
assessment tool. No team members participated in ADOS-G administration and/or scoring prior
to completing the required training and demonstrating scoring competence.
As a part of the autism evaluation process, participants were each administered Module 1,
2, or 3 (depending on the participant’s age and expressive language ability) of the ADOS-G by
the multidisciplinary evaluation team. The ADOS-G was administered and scored in accordance
51
with the standardized procedures set forth by test authors. In addition to the administration of the
ADOS-G, each evaluation also included a number of other assessment activities. Participants’
parents and teachers participated in semi-structured clinical interviews in order for evaluation
personnel to gather information regarding the student’s past and current functioning across
settings. Parents and teachers also completed broadband behavioral rating scales (i.e., BASC-2)
and autism screening/diagnostic measures (i.e., GARS-2). In addition, direct observations of the
student within his or her educational setting were conducted by the evaluation team. Finally,
parents were asked to provide a detailed birth, health, and developmental history regarding their
child. Following the completion of the evaluation process, the multidisciplinary team reviewed
all assessment data and assigned each participant a clinical diagnosis in accordance with the
diagnostic criteria set forth by the Diagnostic and Statistical Manual for Mental Disorders,
Fourth Edition-Text Revision (DSM-IV-TR; American Psychiatric Association, 2004) and a
special education eligibility.
Educational files from all students who participated in multidisciplinary Autism Team
evaluations from January of 2007 through December 2011 were located and reviewed. Next,
demographic information; total scores, domain scores, and/or subscale scores from all
administered standardized assessments; item scores for each item on the ADOS-G,
Communication and Social Interaction domain scores, the Communication + Social Interaction
Total score, and the resulting diagnostic classification obtained from applying the original
scoring algorithm on the ADOS-G; and ending clinical diagnoses and special education
eligibility categories were entered into a database. Participants’ ADOS-G item scores were then
used to “rescore” their performance using the revised ADOS-G scoring algorithm. Participants’
Social Affective and Restricted-Repetitive Behavior domain scores, Social Affective + Restricted-
52
Repetitive Behavior Total scores, and the resulting diagnostic classification were also entered
into the database.
In order to compare diagnostic classifications of the ADOS-G with clinical diagnoses
made without results from the ADOS-G (Hypothesis 4), Licensed Specialists in School
Psychology and predoctoral psychology interns with formal training in autism assessment
reviewed assessment information from a sample of approximately 100 evaluations randomly
selected from the 462 evaluations included in the analyses of diagnostic accuracy. Evaluators
were provided with all of the assessment information obtained during the multidisciplinary team
evaluations with the exception of the participant’s ADOS-G scores and the evaluation team’s
diagnostic conclusions. Based on the other available information, trained clinicians assigned
each participant with a clinical diagnosis, if appropriate.
53
Chapter 3. Results
Preliminary Analyses & Testing of Assumptions
ADOS-G item analyses. Items included in both the Original and Revised Scoring
Algorithms on the ADOS-G were examined for normality, linearity, multicollinearity, and the
presence of multivariate outliers by module. Mean item scores, standard deviations, and skew
and kurtosis values are presented in Tables 3-5. Item scores were considered to be skewed and/or
kurtotic if they exceeded +/-2 or +/-7 respectively (Fabrigar, Wegener, MacCallum, & Strahan,
1999). Skew and kurtosis fell within normal limits for the majority of items across all three
modules. Item D-1 on Module 3 was mildly skewed (skew = 2. 15) and Item A-3 on Module 3
was found to be both mildly skewed and kurtotic (skew = 2.86, kurtosis = 7.95). Item D-3 on
Module 1 (skew = 4.23, kurtosis = 17.17), Module 2 (skew = 6.11, kurtosis = 35.91), and
Module 3 (skew = 8.30, kurtosis = 72.38) was found to be moderately to severely skewed and
kurtotic. However, Item A-3 is not included in either the Original or Revised Scoring Algorithm
for Module 3, and Item D-3 is not included in either scoring algorithm across modules. Linearity
of item scores across modules was supported through the visual inspection of scatterplots. Visual
inspection of standard and reproduced correlation matrices confirmed the presence of moderate
to strong correlations between items and the absence of multicollinearity across modules.
Mahalanobis Distance Tests (Tabachnick & Fidell, 1996) were conducted across modules and
scoring algorithms to investigate the presence of multivariate outliers. No outliers were identified
in the Module 1, Original Scoring Algorithm (OSA); Module 1, Revised Scoring Algorithm
(RSA)1; and Module 2-OSA. The presence of two multivariate outliers were identified in the
Module 2-RSA2; and seven outliers were identified in both the Module 3-OSA and RSAs. For
each module with outliers, preliminary factor analyses were conducted with and without these
54
Table 3
Item Means, Standard Deviations, Skew and Kurtosis Values on Module 1 from the ADOS-G (N = 82) Item M SD Skew Kurtosis
A-1: Overall level of non-echoed language 1.18 .79 -.34 -1.31
A-2: Frequency of vocalizations to othersc 1.49 .72 -1.06 -.27
A-3: Intonation of vocalizations/verbalizations .63 .84 .78 -1.21
A-4: Immediate echolalia .89 .90 .22 -1.76
A-5: Stereotyped use of wordsc .55 .83 1.02 -.77
A-6: Use of other’s body to communicatea .61 .84 .85 -1.05
A-7: Pointingc 1.39 .81 -.84 -.96
A-8: Gesturesc 1.16 .84 -.30 -1.51
B-1: Unusual eye contactc 1.54 .85 -1.27 -.33
B-2: Responsive Social Smile 1.35 .82 -.74 -1.11
B-3: Facial expressions directed to othersc 1.33 .75 -.63 -.97
B-4: Integration of gaze/other behavior during socializationb 1.61 .64 -1.42 .83
B-5: Shared enjoyment in interactionc 1.02 .88 -.05 -1.71
B-6: Response to name 1.32 .83 -.66 -1.23
B-7: Requesting 1.13 .73 -.22 -1.09
B-8: Giving 1.45 .69 -.87 -.43
B-9: Showingc 1.61 .66 -1.46 .85
B-10: Spontaneous initiation of joint attentionc 1.41 .73 -.83 -.66
B-11: Response to joint attentiona 1.22 .80 -.42 -1.32
(table continues)
55
Table 3 (continued)
Items M SD Skew Kurtosis
B-12: Quality of social overturesc 1.65 .62 -1.56 1.32
D-1: Unusual sensory interest in play materials/personb .83 .84 .34 -1.52
D-2: Hand/finger complex mannerismsb
.62 .83 .81 -1.04
D-3: Self-injurious behavior .10 .40 4.23 17.17
D-4: Repetitive interests/stereotyped behaviorsb
.83 .81 .33 -1.41
Note. Items are scored on a 0 to 2 point scale, where a score of 0 = no impairment and a score of 2 = significant impairment. aItem included in the Original Scoring Algorithm only. bItem included in the Revised Scoring Algorithm only. cItem included in both scoring algorithms.
56
Table 4
Item Means, Standard Deviations, Skew, and Kurtosis Values on Module 2 from the ADOS-G (N =118) Item M SD Skew Kurtosis
A-1: Overall level of non-echoed language .92 .78 .13 -1.32
A-2: Social overtures/maintenance of attentiona 1.08 .86 -.17 -1.65
A-3: Autism associated speech abnormalities .82 .84 .35 -1.51
A-4: Immediate echolalia .78 .82 .43 -1.38
A-5: Stereotyped use of wordsc .76 .81 .46 -1.33
A-6: Conversationa 1.38 .81 -.81 -.98
A-7: Pointingc .70 .83 .61 -1.28
A-8: Gesturesc .92 .87 .15 -1.67
B-1: Unusual eye contactc 1.25 .97 -.51 -1.75
B-2: Facial expressions directed to othersc .77 .78 .43 -1.23
B-3: Shared enjoyment in interactionb
.60 .79 .84 -.87
B-4: Response to name .64 .79 .73 -1.02
B-5: Showingb
1.03 .77 -.04 -1.29
B-6: Spontaneous initiation of joint attentionc .84 .77 .29 -1.27
B-7: Response to joint attention .66 .81 .70 -1.12
B-8: Quality of social overturesc 1.06 .78 -.10 -1.33
B-9: Quality of social responsea 1.03 .78 -.04 -1.34
B-10: Amount of reciprocal social communicationc 1.24 .86 -.48 -1.50
B-11: Overall quality of rapportc .94 .84 .11 -1.58
(table continues)
57
Table 4 (continued)
Items M SD Skew Kurtosis
D-1: Unusual sensory interest in play materials/personb .36 .58 1.40 .99
D-2: Hand/finger complex mannerismsb
.35 .61 1.55 1.33
D-3: Self-injurious behavior .03 .16 6.11 35.91
D-4: Repetitive interests/stereotyped behaviorsb
.52 .71 1.01 -.31
Note. Items are scored on a 0 to 2 point scale, where a score of 0 = no impairment and a score of 2 = significant impairment. aItem included in the Original Scoring Algorithm only. bItem included in the Revised Scoring Algorithm only. cItem included in both scoring algorithms.
58
Table 5
Item Means, Standard Deviations, Skew, and Kurtosis Values on Module 3 from the ADOS-G (N = 262) Items M SD Skew Kurtosis
A-1: Overall level of non-echoed language .38 .60 1.34 .76
A-2: Autism associated speech abnormalities 1.09 .74 -.14 -1.16
A-3: Immediate echolalia .15 .40 2.86 7.95
A-4: Stereotyped use of words/phrasesc .73 .72 .47 -.98
A-5: Offers information .76 .84 .47 -1.42
A-6: Asks for information 1.45 .73 -.93 -.56
A-7: Reporting of eventsc 1.05 .83 -.09 -1.53
A-8: Conversationc 1.34 .77 -.67 -1.01
A-9: Gesturesc .81 .82 .36 -1.43
B-1: Unusual eye contactc 1.25 .97 -.52 -1.74
B-2: Facial expressions directed to othersc .87 .71 .19 -1.02
B-3: Language production/linked nonverbal communication .26 .51 1.78 2.31
B-4: Shared enjoyment in interactionb
.81 .82 .36 -1.43
B-5: Empathy/comments on others’ emotions 1.40 .73 -.77 -.74
B-6: Insighta 1.44 .72 -.89 -.57
B-7: Quality of social overturesc 1.21 .69 -.30 -.88
B-8: Quality of social responsec 1.11 .67 -.13 -.78
B-9: Amount of reciprocal social communicationc 1.18 .79 -.33 -1.32
B-10: Overall quality of rapportc 1.07 .80 -.12 -1.40
(table continues)
59
Table 5 (continued)
Items M SD Skew Kurtosis
D-1: Unusual sensory interest in play materials/personb .23 .51 2.15 3.79
D-2: Hand/finger complex mannerismsb
.29 .59 1.87 2.34
D-3: Self-injurious behavior .03 .20 8.30 72.38
D-4: Excessive interest in specific topics/repetitive behaviorsb
.66 .80 .70 -1.07
D-5: Compulsions or rituals .28 .57 1.96 2.69
Note. Items are scored on a 0 to 2 point scale, where a score of 0 = no impairment and a score of 2 = significant impairment. aItem included in the Original Scoring Algorithm only. bItem included in the Revised Scoring Algorithm only. cItem included in both scoring algorithms.
60
outliers and the solutions did not vary significantly. Thus, data from all participants were
included in subsequent analyses.
Total, scale, and subscale score analyses. Table 6 presents the means, standard
deviations, score ranges, skew, and kurtosis values for ADOS-G Total Scores, GARS-2 index
and subscale scores, and subscale scores on the BASC-2. Skew and kurtosis of scores fell within
acceptable ranges for all total/index/subscale scores. Linearity of item scores across modules and
scoring algorithms was supported through visual inspection of scatterplots. Thus, data from all
participants were included in subsequent analyses.
Hypothesis 1: Factor Structure of the Original and Revised Scoring Algorithms
Exploratory Factor Analyses (EFA) were conducted (by module) on the items included in
the Original and Revised Scoring Algorithms of the ADOS-G. Based on the factor structures
outlined by the authors for the OSA (Lord et al., 1999) and RSA (Gotham et al., 2007), it was
expected that the items included in the OSA for each module would reflect a uni-dimensional
structure; whereas the items included in the Revised Scoring Algorithm would reflect a two-
factor structure.
To determine the adequacy of ADOS-G module items for factorability, several steps were
taken. First, the relationships between items were examined by generating a correlation matrix.
According to Tabachnick and Fidell (2007), a factorable correlation matrix should include
several sizable correlations. If the matrix was determined to be adequate, Bartlett’s Test of
Sphericity (Bartlett, 1950) was next conducted to test the null hypothesis that the correlation
matrix is an identity matrix. The Kaiser-Meyer-Olkin test of sampling adequacy (Kaiser, 1974)
was also calculated and examined to further investigate factorability. KMO values > .60 were
accepted as evidence of factorability (Kaiser).
61
Table 6 Participants’ Means, Standard Deviations, Score Range, Skew, and Kurtosis Values on the ADOS-G, GARS-2, and Selected Subscales from the BASC-2
Scale/Subscale M SD Rangea
Skew Kurtosis
ADOS-G: Original Scoring Algorithms
Total Score, M1 (N = 82) 15.17 6.63 0 - 24 -.83 -.30
Total Score, M2 (N = 118) 11.98 8.07 0 - 24 -.18 -1.39
Total Score, M3 (N = 262) 12.06 6.07 0 - 22 -.27 -.94
ADOS-G: Revised Scoring Algorithms
Total Score, M1 (N = 66)
15.77 7.49 0 - 27 -.63 -.57
Total Score, M2 (N = 73)
11.34 8.05 0 - 27 .09 -1.15
Total Score, M3 (N = 261) 12.59 6.79 0 - 27 -.15 -1.03
GARS-2 Parent Ratings (N = 109)
Autism Index 80.71 18.94 40 - 130 .28 .36
Stereotyped Behaviors 6.73 2.96 1 - 15 .54 -.23
Communication 7.96 3.64 2 - 16 .35 -.84
Social Interaction 6.51 3.13 2 - 16 .65 -.11
GARS 2 Teacher Ratings (N = 112)
Autism Index 85.38 18.86 40 - 132 .06 .23
Stereotyped Behaviors 6.56 2.94 0 - 16 .37 .33
Communication 8.60 3.78 0 - 18 -.05 -.27
Social Interaction 7.62 3.47 0 - 15 .16 -.96
(table continues)
62
Table 6 (continued) Scale/Subscale M SD Rangea Skew Kurtosis
BASC-2 Parent Ratings (N = 261)
Anxiety 50.73 12.37 28 - 96 .82 .84
Atypicality 67.09 16.34 24 - 120 .53 .35
Withdrawal 63.62 14.80 33 - 120 .51 .53
BASC 2 Teacher Ratings (N = 261)
Anxiety 54.10 14.19 38 - 103 1.20 1.11
Atypicality 73.64 17.55 36 - 120 .29 -.48
Withdrawal 67.97 13.49 38 - 100 .12 -.61
Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic; GARS-2 = Gilliam Autism Rating Scale, Second Edition; BASC-2 = Behavior Assessment System for Children, Second Edition; BRIEF = Behavior Rating Inventory of Executive Function; M = Module. Total scores on the Original Scoring Algorithm were obtained by summing the Communications and Social Interaction total scores; Total scores on the Revised Scoring Algorithm were obtained by summing the Social Affect + Restricted Repetitive Behavior total scores. aObserved score ranges.
63
Common factor analysis was conducted because the goal of the study is to identify the
latent structure of the ADOS-G items (Wegener & Fabrigar, 2000). Additionally, common factor
analysis has been reported to produce more accurate estimates of population parameters than
Principal Components Analysis (Widamann, 1993). Principal axis extraction was utilized
because it is less likely to be affected by multivariate nonnormality than other extraction
methods, such as maximum likelihood extraction (Briggs & MacCallum, 2003). Communalities
were initially estimated using squared multiple correlations, and the initial number of factors to
retain for rotation was based on theory, visual inspection of the scree test (Catell, 1966), parallel
analysis (Horn, 1965), and minimum average partials (MAP; Velicer, 1976). Because factors are
assumed to be correlated, a Promax rotation with a k value of 4 was selected (Tataryn, Wood, &
Gorsuch, 1999). The final selection of factor structure was determined on (a) salient
pattern/structure coefficients greater than or equal to .32; (b) a minimum of three items with
salient loadings factor; (c) simple structure (i.e., items loaded saliently on a single factor only;
Thurstone, 1947); (d) resulting scale reliability estimates greater than or equal to .70; and (e)
theoretical convergence.
Module 1 - Original Scoring Algorithm (OSA). Data from the 12 items included in the
Module 1-OSA were submitted for common factor analysis (Principal Axis Factoring
extraction). Bartlett’s Test of Sphericity (χ2 = 577.723, df = 66, p < .001) and the Kaiser-Meyer-
Olkin statistic (.899) indicated that the correlation matrix was adequate for factorability. In
addition, the correlation matrix (presented in Appendix C) of the aforementioned items was
reviewed and determined to contain several correlations above .30 (Tabachnick & Fidell, 2007).
Therefore, all reviewed statistics suggested that the correlation matrix was appropriate for factor
analysis.
64
MAP criteria, parallel analysis, and visual inspection of the scree plot recommended the
retention of one factor. Therefore, a one-factor solution was extracted and examined. The
resulting solution (Table 7) was adequate based on the standards set a priori. Eleven items loaded
saliently on the one-factor solution, with structure coefficients ranging from .35 to .89 (Mdn =
.73) and communalities ranging from .13 to .78 (Mdn = .56). The one-factor solution accounted
for 52 percent of the total variance between the items and was robust across extraction
(Unweighted Least Squares) methods.
Because one item (A-5) did not load on the one-factor solution and research has indicated
that over-factoring is better than under-factoring (Wood, Tataryn, & Gorsuch, 1996), a two-
factor solution was extracted, rotated (Promax rotation), and examined for adequacy. However,
simple structure was not observed in the two-factor solution. Specifically, there were two items
that saliently loaded on both factors, and three items did not load on any factor. Thus, the two-
factor solution was rejected.
Examination of the items that saliently loaded on the factor indicates that they each
measure an aspect of verbal or non-verbal attempts at initiating or sustaining social
communication. Thus, this factor was labeled Social Communication. The reliability estimate
(Cronbach’s α) of the scores on the Social Communication factor was .90, and with the exception
of Item A-5, all of the corrected inter-item correlations for each item on the scale were greater
than or equal to .34, with the majority of the correlations falling above .60. Further, item-total
statistics (see Appendix D) indicate that, with the exception of Item A-5, all of the items are
adding to the overall scale reliability and that deleting any of the items would not improve the
overall scale reliability.
65
Table 7 Structure Coefficients and Communalities for the ADOS-G Module 1(Original Scoring Algorithm) Items (N = 82) Item Structure Coefficient Communality
A-2: Frequency of vocalizations to others .892 .795
A-5: Stereotyped use of words .086 .007
A-6: Use of other’s body to communicate .354 .126
A-7: Pointing .749 .561
A-8: Gestures .654 .427
B-1: Unusual eye contact .732 .536
B-3: Facial expressions directed to others .823 .677
B-5: Shared enjoyment in interaction .727 .528
B-9: Showing .780 .609
B-10: Spontaneous initiation of joint attention .755 .570
B-11: Response to joint attention .621 .386
B-12: Quality of social overtures .831 .690
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.
66
A follow-up analysis (Appendix E) was conducted to determine the suitability of the one-
factor solution when Item A-5 (i.e., the item that did not load saliently on the one-factor solution)
was deleted. All items loaded saliently on the solution, and it accounted for 58 percent of total
variance of Module 1 OSA items. In addition, the reliability estimate (Cronbach’s Alpha) for the
updated scale equaled .91.
Module 1 - Revised Scoring Algorithm (RSA). Data from the 14 items of the Module
1-RSA were also submitted for common factor analysis (Principal Axis Factoring extraction).
Bartlett’s Test of Sphericity (χ2 = 532.557, df = 91, p < .001), Kaiser-Meyer-Olkin statistic
(.887), and the item correlation matrix (see Appendix C) all suggested that the correlation matrix
was adequate for factorability. MAP criteria, parallel analysis, and visual inspection of the scree
plot recommended the retention of one factor; however, the theoretical rationale reported by the
test authors (Gotham et al., 2007) specified the retention of two factors. Thus, solutions
containing one- and two-factors were examined.
The two-factor solution is presented in Table 8. Each of the 14 items loaded saliently and
singularly on the two-factor solution, and it accounted for 60 percent of the total variance. Ten
items were salient on Factor 1, with pattern coefficients ranging from .45 to .92 (Mdn = .75), and
four items were salient on Factor 2, with pattern coefficients ranging from .41 to .73 (Mdn =
.66).Communalities ranged from .17 to .76 (Mdn = .54), and the factor intercorrelation was .58.
The two-factor solution was robust across extraction (Unweighted Least Squares) and rotation
(Direct Oblimin) methods. Reliability estimates (Cronbach’s α) were .93 and .70 for Factor 1 and
Factor 2, respectively.
The one-factor solution also was examined for suitability (Table 9). Twelve of the
fourteen items loaded saliently on the one-factor solution, with structure coefficients ranging
67
Table 8 Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 1(Revised Scoring Algorithm) Items (N = 66) Pattern Coefficients
Structure Coefficients
Item Factor 1
Factor 2 Factor 1 Factor 2 Communality
A-2: Frequency of vocalizations/verbalizations to others .917 -.077 .872 .452 .764
A-5: Stereotyped use of words/phrases .000 .413 .238 .413 .170
A-7: Pointing .688 .036 .709 .433 .503
A-8: Gestures .448 .300 .621 .558 .445
B-1: Unusual eye contact .646 .169 .743 .541 .571
B-3: Facial expressions directed to others .799 .081 .846 .543 .721
B-4: Integration of gaze/other social behav. in social overtures .868 -.173 .768 .327 .609
B-5: Shared enjoyment in interactions .537 .218 .663 .528 .471
B-9: Showing .807 -.038 .785 .428 .617
B-10: Spontaneous initiation of joint attention .750 -.027 .734 .406 .540
B-12: Quality of social overtures .864 -.058 .830 .441 .692
(table continues)
68
Table 8 (continued)
Pattern Coefficients
Structure Coefficients
Item
Factor 1
Factor 2 Factor 1 Factor 2 Communality
D-1: Unusual sensory interests in person/objects .017 .731 .439 .741 .549
D-2: Hand/finger mannerisms -.144 .663 .239 .580 .350
D-4: Repetitive interests/stereotyped behaviors .070 .649 .445 .690 .479
Note. Table presents the extraction of a two-factor solution using Principal Axis Extraction and Promax Rotation. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 1, No Words Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only
69
Table 9
Structure Coefficients and Communalities for the ADOS-G Module 1(Revised Scoring Algorithm) Items (N = 66) Item Structure Coefficient Communality
A-2: Frequency of vocalizations/verbalizations to others .846 .716
A-5: Stereotyped use of words/phrases .288 .083
A-7: Pointing .706 .498
A-8: Gestures .654 .427
B-1: Unusual eye contact .758 .575
B-3: Facial expressions directed to others .848 .720
B-4: Integration of gaze/social behav. in social overtures .727 .528
B-5: Shared enjoyment in interactions .686 .470
B-9: Showing .768 .590
B-10: Spontaneous initiation of joint attention .721 .519
B-12: Quality of social overtures .809 .655
D-1: Unusual sensory interests in person/objects .512 .262
D-2: Hand/finger mannerisms .314 .099
D-4: Repetitive interests/stereotyped behaviors .512 .262
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 1, No Words Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only.
70
from .51 to .85 (Mdn = .73) and communalities from .08 to .72 (Mdn = .50). The one-factor
solution accounted for 49% of the total variance between the items and was robust across
extraction (Unweighted Least Squares) methods.
Despite these findings, the two-factor solution was retained for interpretation instead of
the one-factor solution because it demonstrated theoretical convergence, higher communalities
and factor loadings, and accounted for a larger percent of total variance. In addition, every item
loaded saliently and singularly (i.e., loaded on one factor only) on the two-factor solution,
whereas two items did not load on the one-factor solution.
As a reflection of the items that saliently loaded on each factor, Factor 1 was labeled
Social Communication (SC) and Factor 2 was labeled Stereotyped/Repetitive Behaviors (SRB).
On the SC scale, all of the corrected inter-item correlations for each item (see Appendix D) were
greater than.61, with the majority of the correlations falling above .70, and all of the items on the
scale added to the overall scale reliability. Corrected inter-item correlations for each item on the
RRB scale exceeded .33. Item-total statistics report that three of the four items are adding to
overall scale reliability. One item (A-5), if deleted would provide a very modest increase in
overall scale reliability (i.e., from .70 to .71).
Module 2 – Original Scoring Algorithm. Common factor analysis (Principal Axis
Factoring extraction) also was conducted on the 12 items included in the Module 2-OSA.
Bartlett’s Test of Sphericity (χ2 = 1214.032, df = 66, p < .001) and the KMO statistic (.934) were
once again reviewed to determine the adequacy of the ADOS-G Module 2-OSA items for
factorability. In addition, the item correlation matrix (Appendix C) contained primarily moderate
to strong correlations between items, suggesting that the correlation matrix was appropriate for
factor analysis.
71
MAP criteria, parallel analysis, visual inspection of the scree plot, and the authors’
theoretical rationale each recommended retention of one factor. Therefore, a one-factor solution
was extracted and examined. The resulting solution (Table 10) was examined for suitability and
determined to be adequate based on the standards set a priori. Each of the 12 items loaded
saliently on the one-factor solution, with structure coefficients ranging from .64 to .85 (Mdn =
.83) and communalities ranging from .41 to .79 (Mdn = .69). The one-factor solution accounted
for 66 percent of the total variance and was robust across extraction (Unweighted Least Squares)
methods.
Consistent with the Module 1-OSA, examination of the items that saliently loaded on the
factor indicates that they each measure an aspect of verbal or nonverbal communication. Thus,
this factor was labeled Social Communication. The scale reliability estimate for the Social
Communication factor (Cronbach’s α = .95), and the corrected inter-item correlation for each
item on the scale was greater than or equal to .62, with the majority of the correlations falling
above .77. Further, item-total statistics (see Appendix D) indicate that all of the items on the
scale are adding to the overall scale reliability and that deleting any of the items would not
improve the overall scale reliability.
Module 2 - Revised Scoring Algorithm. Data from the 14 items of the Module 2-RSA
submitted for common factor analysis (Principal Axis Factoring extraction) also were determined
to be adequate for factorability based on the Bartlett’s Test of Sphericity (χ2 = 711.141, df = 91, p
< .001), KMO statistic (.889), and review of the inter-item correlation matrix (Appendix C).
MAP criteria, parallel analysis, and visual inspection of the scree plot recommended the
retention of one factor, whereas the theoretical rationale reported by the test authors (Gotham et
al., 2007) supported the retention of two factors. Thus, solutions containing one and two factors
72
Table 10 Structure Coefficients and Communalities for the ADOS-G Module 2 (Original Scoring Algorithm) Items (N = 118) Item Structure Coefficient Communality
A-2: Social overtures/maintenance of attention .848 .720
A-5: Stereotyped use of words .684 .468
A-6: Conversation .828 .685
A-7: Pointing .707 .499
A-8: Gestures .810 .656
B-1: Unusual eye contact .705 .496
B-2: Facial expressions directed to others .791 .626
B-6: Spontaneous initiation of joint attention .640 .409
B-8: Quality of social overtures .887 .787
B-9: Quality of social response .866 .749
B-10: Amount of reciprocal social communication .889 .790
B-11: Overall quality of rapport .845 .714
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.
73
were examined. Although neither the two-factor nor one-factor solution resulted in optimal
model fit, both solutions met a priori criteria set for factor retention. Table 11 presents the pattern
coefficients, structure coefficients, and communalities for the two-factor solution. Structure
coefficients and communalities for the one-factor solution are also presented in Table 12.
Each of the 14 items loaded saliently on the two-factor solution. However, one item (A-8)
demonstrated salient loadings on both extracted factors. In addition, the Factor 1 pattern
coefficient for Item B-5 was slightly greater than one. The two-factor solution accounted for 64
percent of the total variance, and communalities ranged from .26 to .76 (Mdn = .61). Eleven
items were salient on Factor 1, with pattern coefficients ranging from .47 to 1.01 (Mdn = .73),
and four items were salient on Factor 2, with pattern coefficients ranging from .38 to .63 (Mdn =
.63). The factor intercorrelation was .66. The two-factor solution was robust across extraction
(Unweighted Least Squares) and rotation (Direct Oblimin) methods. Reliability estimates
(Cronbach’s α) =.94 and .70 for Factor 1 and Factor 2, respectively. On Factor 1, all of the
corrected inter-item correlations for each item were greater than or equal to .64, with the
majority of the correlations falling above .70, and all of the items on the scale are adding to the
overall scale reliability (Appendix D). On Factor 2, the corrected inter-item correlation for each
item was greater than or equal to .29. However, item-total statistics report that only three of the
four items are adding to overall scale reliability: one item (D-2), if deleted, would provide a
modest increase in overall scale reliability (i.e., from .70 to .74).
The one factor solution was also examined for suitability. Thirteen of the fourteen items
loaded saliently on the one-factor solution, with salient structure coefficients ranging from .44 to
.85 (Mdn = .78) and communalities ranging from .04 to .72 (Mdn = .61). The one-factor solution
accounted for 54 percent of the total variance between scale items and was robust across
74
Table 11 Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73) Pattern Coefficients
Structure Coefficients
Item Factor 1
Factor 2 Factor 1 Factor 2 Communality
A-5: Stereotyped use of words .471 .271 .650 .582 .464
A-7: Pointing .727 -.008 .722 .471 .521
A-8: Gestures .514 .375 .761 .714 .659
B-1: Unusual eye contact .502 .202 .635 .533 .426
B-2: Facial expressions directed to others .590 .263 .763 .651 .621
B-3: Shared enjoyment in interactions .642 .187 .765 .610 .605
B-5: Showing 1.006 -.243 .846 .420 .749
B-6: Spontaneous initiation of joint attention .910 -.282 .725 .318 .570
B-8: Quality of social overtures .920 -.077 .870 .530 .760
B-10: Amount of reciprocal social communication .870 -.016 .859 .557 .738
B-11: Overall quality of rapport .781 .090 .840 .604 .710
(table continues)
75
Table 11 (continued)
Pattern Coefficients
Structure Coefficients
Item Factor 1
Factor 2 Factor 1 Factor 2 Communality
D-1: Unusual sensory interests in person/play materials -.026 .630 .389 .613 .376
D-2: Hand/finger complex mannerisms -.250 .578 .131 .413 .206
D-4:Repetitive interests/stereotyped behaviors .184 .627 .597 .748 .578
Note. Table presents the extraction of a two-factor solution using Principal Axis Extraction and Promax Rotation. Salient pattern coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 2, Less Than 5 Years Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only.
76
Table 12 Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73) Item Structure Coefficient Communality
A-5: Stereotyped use of words .674 .454
A-7: Pointing .714 .510
A-8: Gestures .792 .628
B-1: Unusual eye contact .653 .426
B-2: Facial expressions directed to others .785 .616
B-3: Shared enjoyment in interactions .780 .608
B-5: Showing .801 .642
B-6: Spontaneous initiation of joint attention .687 .460
B-8: Quality of social overtures .851 .724
B-10: Amount of reciprocal social communication .849 .721
B-11: Overall quality of rapport .844 .712
D-1: Unusual sensory interests in person/play materials .443 .196
D-2: Hand/finger complex mannerisms .187 .035
D-4:Repetitive interests/stereotyped behaviors .645 .416
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 2, Less Than 5 Years Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only.
77
extraction (Unweighted Least Squares) methods. The reliability estimate (Cronbach’s α) of the
scores on the factor was .93, and, with the exception of Item D-2, all of the corrected inter-item
correlations for each item on the scale exceeded .43, with the majority of the correlations falling
above .76. Further, item-total statistics (Appendix D) indicate that, with the exception of Item D-
2, all of the items on the scale are adding to the overall scale reliability and that deleting any of
the items would not improve overall reliability.
After examining the two solutions, the one-factor solution was retained for interpretation
instead of the two-factor solution due to the presence of a Heywood Case (i.e., a factor loading
greater than one; Costello & Osborne, 2005), lack of item singularity, failure to adhere to
theoretical rationale, and minimally acceptable scale reliability for one of the two extracted
factors. Examination of the items that saliently loaded on the factor indicates that they measure
aspects of social functioning, communication, and engagement in stereotyped repetitive
behaviors. Thus, the factor was labeled Autistic Characteristics.
A follow-up EFA (Appendix E) was conducted to determine the suitability of the one-
factor solution when Item D-2 (i.e., the item that did not load saliently on the one-factor solution)
was deleted. All items loaded saliently on the solution, and it accounted for 58 percent of total
variance of Module 2 RSA items. Cronbach’s Alpha = .94 for the updated scale.
Module 3 - Original Scoring Algorithm. The correlation matrix of the 11 items was
also adequate for factorability (Bartlett’s Test of Sphericity [χ2 = 1504.436, df = 55, p < .001)];
KMO statistic = .933; multiple inter-item correlations > .30 [Tabachnick & Fidell, 2007; see
Appendix C]). All relevant criteria recommended the retention of a one-factor solution.
Therefore, a one-factor solution was extracted (Table 13), examined, and was determined to be
adequate based on the standards set a priori. Ten items loaded saliently on the one-factor
78
Table 13
Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items (N = 262) Item Structure Coefficient Communality
A-4: Stereotyped use of words/phrases .304 .092
A-7: Reporting of events .621 .368
A-8: Conversation .795 .631
A-9: Gestures .647 .418
B-1: Unusual eye contact .555 .308
B-2: Facial expressions directed to others .752 .566
B-6: Shared enjoyment in interactions .651 .424
B-7: Quality of social overtures .833 .694
B-8: Quality of social response .819 .672
B-9: Amount of reciprocal social communication .806 .649
B-10: Overall quality of rapport .763 .582
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.
79
solution, with salient structure coefficients ranging from .56 to .81 (Mdn = .75) and
communalities ranging from .09 to .69 (Mdn = .57). The one-factor solution accounted for 53
percent of the total variance and was robust across extraction (Unweighted Least Squares)
methods.
Because one item (A-4) did not load on the one-factor solution, a two-factor solution was
examined for adequacy. However, the two-factor solution was rejected because it did not
demonstrate simple structure (i.e., there were four items that saliently loaded on both factors and
each factor did not contain at least three items with salient pattern coefficients).
The retained factor was labeled Social Communication to reflect the content of the items
that saliently loaded on the factor. The reliability estimate (Cronbach’s α) of the Social
Communication scale was .90, and with the exception of item A-4, all of the corrected inter-item
correlations were greater than or equal to .34, with the majority of the correlations falling above
.55. Further, item-total statistics (Appendix D) indicate that, with the exception of item A-4, all
of the items on the scale are adding to the overall scale reliability.
The ten items with salient loadings were resubmitted for a follow-up EFA. The suitability
of the one-factor solution was confirmed (see Appendix E). Each of the items also demonstrated
salient loadings on the new solution, which accounted for 58 percent of the variance between
items. The reliability estimate (Cronbach’s Alpha) of the updated scale equaled .92.
Module 3 - Revised Scoring Algorithm. Data from the 14 items of the Module 3-RSA,
also were submitted for common factor analysis using Principal Axis Factoring extraction. The
adequacy of the correlation matrix for factorability was established by the Bartlett’s Test of
Sphericity (χ2 = 1680.750, df = 91, p < .001), KMO statistic (.926), and a review of the inter-item
correlation matrix (see Appendix C). MAP criteria, parallel analysis, and the visual inspection of
80
a scree plot recommended the retention of one factor, where as the theoretical rationale reported
by the test authors (Gotham et al., 2007) supported the retention of two factors. Thus, solutions
containing one and two factors were extracted and examined.
A two-factor solution was unable to be extracted within the 25 iterations allowed by
SPSS and, therefore, could not be considered. The one-factor solution (Table 14) was
determined to be adequate based on the standards set a priori. Twelve items loaded saliently on
the one-factor solution, with salient structure coefficients ranging from .56 to .81 (Mdn = .78)
and communalities ranging from .02 to .68 (Mdn = .57). Subsequent analyses determined that the
one-factor solution was robust across extraction (Unweighted Least Squares) methods.
The one-factor solution accounted for 45 percent of the total variance between the
Module 3-RSA items, and the reliability estimate (Cronbach’s α) of the scores on the factor was
.89. With the exception of items D-1 and D-2, corrected inter-item correlations for each item on
the scale were greater than or equal to .33, with the majority of the correlations falling above .70.
Item-total statistics (Appendix D) report that 12 of the 14 items (i.e., all items except for D1 and
D2) are adding to the overall scale reliability. In addition, deletion of items D1 and D2 from the
scale would increase scale reliability. Salient items were reviewed for content and determined to
measure aspects of social functioning, communication, and engagement in stereotyped repetitive
behaviors. Thus, the factor was labeled Autistic Characteristics.
A follow-up EFA (Appendix E) was conducted to determine the suitability of the one-
factor solution with the deletion of Items D-1 and D-2. All items loaded saliently on the new
solution, and it accounted for 51 percent of total variance of Module 3 RSA items. Cronbach’s
Alpha = .90 for the updated scale.
81
Table 14 Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items (N = 261) Item Structure Coefficient Communality
A-4: Stereotyped use of words/phrases .327 .107
A-7: Reporting of events .616 .380
A-8: Conversation .781 .610
A-9: Gestures .662 .438
B-1: Unusual eye contact .548 .300
B-2: Facial expressions directed to others .774 .598
B-4: Shared enjoyment in interaction .824 .678
B-7: Quality of social overtures .810 .656
B-8: Quality of social response .806 .650
B-9: Amount of reciprocal social communication .758 .575
B-10: Overall quality of rapport .755 .570
D-1: Unusual sensory interest in play materials/person .236 .056
D-2: Hand/finger complex mannerisms .136 .019
D-4: Excessive interest in specific topics/repetitive behav .395 .156
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.
82
Hypothesis 2: Relationships between Scores on the ADOS-G and Other Measures
Bivariate correlations (Pearson’s r) were calculated to determine the strength of the
relationships between participants’ total scores on the ADOS-G and parent and teacher ratings of
participants’ behavioral functioning on the GARS-2 (Autism Index and subscale scores), and on
select subscales of the BASC-2. Because the scores generated from the ADOS-G and behavior
rating scales do not use a common metric (e.g., standard scores or T scores), z scores were
calculated for each of the variables to allow for appropriate comparisons. Z scores for parent and
teacher ratings on the GARS-2 and BASC-2, were then correlated with ADOS-G total score z
scores across module and scoring algorithm. Correlations were calculated using the ADOS-G
total scores based on the OSA (Lord et al., 1999), RSA (Gotham et al., 2007), and Updated
Scoring Algorithm (i.e., the scoring algorithms identified from the factor analyses conducted in
the current sample with recommended item deletions for Module 1-OSA, Module 2-RSA, and
Module 3-OSA and RSA). Table 15 presents the validity coefficients representing the
relationships between participants’ total scores on Modules 3 of the ADOS-G across scoring
algorithms, and parent and teacher ratings of participants’ behavior on the GARS-2. Table 16
presents the validity coefficients representing the relationships between participants’ total scores
on the ADOS-G across modules and scoring algorithms, and parent and teacher ratings’ of
participants’ behavior on select subscales of the BASC-2. Validity coefficients were interpreted
as follows: r < .30 = weak correlation, .30 < r < .59 = moderate correlation, r > .60 = strong
correlation (Cicchetti, 1994).
Module 1. Consistent with hypotheses, results of the correlational analysis indicate that
weak relationships exist between participants’ total scores on Module 1 of the ADOS-G, across
scoring algorithms, and parent and teacher ratings’ of participants behavior on the BASC-2
83
Table 15 Pearson Correlations between Participants’ Total Scores on the ADOS-G Original and Revised Scoring Algorithms for Module 3 and Parent and Teacher Ratings on the GARS-2 Module 3-OSA
Module 3-RSA
Scale/Subscale Authors’ FS Identified FS Authors’ FS Identified FS
GARS-2: Parent Ratings (N = 72)
Autism Index -.15 .02 -.20 .03
Stereotyped Behaviors -.30 .06 -.34 .05
Communication -.15 -.06 -.19 .02
Social Interaction -.08 -.01 -.12 -.05
GARS-2: Teacher Ratings (N = 70)
Autism Index .08 .10 .12 .06
Stereotyped Behaviors .04 .09 .08 .04
Communication .15 .16 .18 .14
Social Interaction .08 .01 .11 -.03
Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. FS = Factor Structure. GARS-2 = Gilliam Autism Rating Scale, Second Edition. *p < .05. **p < .01.
84
Table 16
Pearson Correlations between Participants’ Total Scores on the ADOS-G Original, Revised, and Updated Scoring Algorithms and Parent and Teacher Ratings on the BASC-2 BASC-2 Subscale
Anxiety Atypicality Withdrawal
ADOS-G Module/Scoring Algorithm PRS TRS PRS TRS PRS TRS
Module 1 - Authors’ Algorithms
Original Scoring Algorithm -.24 -.25 .43** .13 .40** .23
Revised Scoring Algorithm -.23 -.21 .52** .13 .47** .20
Module 1 – Updated Algorithms
Original Scoring Algorithm -.12 -.32 .07 .05 .04 .01
Revised Scoring Algorithm N/A N/A N/A N/A N/A N/A
Module 2 - Authors’ Algorithms
Original Scoring Algorithm -.05 .06 .16 .37** .22 .19
Revised Scoring Algorithm -.27 .09 .14 .35* .21 .16
Module 2 – Updated Algorithms
Original Scoring Algorithm N/A N/A N/A N/A N/A N/A
Revised Scoring Algorithm .24 .10 .08 -.04 .16 -.04
Module 3 - Authors’ Algorithms
Original Scoring Algorithm -.15 -.18* -.02 .14 .09 -.09
Revised Scoring Algorithm -.14 -.21** 0 -.14 .12 -.07
Module 3 – Updated Algorithms
Original Scoring Algorithm .01 .07 .05 -.04 .16 .03
(table continues)
85
Table 16 (continued) BASC-2 Subscale
Anxiety Atypicality Withdrawal
ADOS-G Module/Scoring Algorithm PRS TRS PRS TRS PRS TRS
Module 3 – Updated Algorithms
Revised Scoring Algorithm .03 .07 .01 -.05 .13 .01
Note. Sample size varied by module, scoring algorithm, and rater. N’s are as follows: Module 1-OSA PRS = 47, TRS = 45; Module 1-RSA PRS = 42, TRS = 32; Module 2-OSA PRS = 73, TRS = 79; Module 2-RSA PRS = 68, TRS = 67; Module 3-OSA and RSA PRS = 145, TRS = 148. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. Lord et al. (1999) authored the Original Scoring Algorithm, and Gotham et al. (2007) authored the Revised Scoring Algorithm. Updates to the scoring algorithms for Module 1-RSA and Module 2-OSA were not needed based on data obtained from the EFAs conducted on these modules. ADOS-G = Autism Diagnostic Observation Schedule-Generic; BASC-2 = Behavior Assessment System for Children, Second Edition. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from correlational analyses. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only. In addition, references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. *p < .05. **p < .01.
86
Anxiety subscale; and moderate relationships exist between total scores on the Module 1-OSA
and RSA and parent ratings on the BASC-2 Atypical and Withdrawal subscales. However,
inconsistent with hypotheses, weak relationships were measured between teacher ratings on the
BASC-2 Atypical and Withdrawal subscales and participants’ total scores on the Module 1-OSA
and RSA.
Correlations also were calculated between parent and teacher ratings on the BASC-2 and
total scores for the Module 1-Updated OSA. Consistent with hypotheses, results indicate that
weak relationships exist between participants total scores on the ADOS-G and parent ratings on
the BASC-2 Anxiety subscale. However, inconsistent with hypotheses, weak relationships were
also measured between parent and teacher ratings on the BASC-2 Atypicality and Withdrawal
subscales and total scores on the ADOS-G; and a negative moderate relationship was measured
between ADOS-G total scores and teacher ratings on the BASC-2 Anxiety subscale.
Module 2. Consistent with hypotheses, results of the correlational analysis indicate that
weak relationships exist between participants’ total scores on Module 2 of the ADOS-G, across
scoring algorithms, and parent and teacher ratings’ of participants behavior on the BASC-2
Anxiety subscale; and moderate relationships exist between total scores on the Module 2-OSA
and RSA and teacher ratings on the BASC-2 Atypical subscale. Inconsistent with hypotheses,
weak relationships were also measured between parent and teacher ratings on the BASC-2
Withdrawal subscale and ADOS-G total score across scoring algorithms; and between total
scores on the Module 2-OSA and RSA and parent ratings on the BASC-2 Atypicality subscale.
Correlations were also calculated between parent and teacher ratings on the BASC-2 and
total scores for the Module 2-RSA using the Module 2 Updated RSA (no changes were
recommended for the Module 2-OSA). Weak relationships were measured between parent and
87
teacher ratings on the BASC-2 Anxiety subscale and ADOS-G total scores, which is consistent
with hypotheses. However, inconsistent with hypotheses, weak relationships were also measured
between parent and teacher ratings on the BASC-2 Atypical and Withdrawal subscales and
ADOS-G total scores.
Module 3. On the GARS-2, inconsistent with hypotheses, weak negative relationships
were measured between participants’ total scores on the ADOS-G OSA and RSA, and parent
ratings on the Communication and Social Interactions subscales. Although moderate correlations
were measured between ADOS-G scores, across scoring algorithms, and parent ratings of
participants’ behavior on the Stereotyped Behaviors subscale, these relationships were also
negative, which is inconsistent with theoretical expectations. Parent ratings across the three
subscales resulted in weak negative relationships between total scores on the GARS-2 (i.e., the
Autism Index) and participants’ total scores on the Module 3-OSA and RSA. Consistently weak
correlations were measured between teacher ratings across the subscales and Autism Index on
the GARS-2 and ADOS-G total scores across scoring algorithms. This result is also inconsistent
with hypotheses.
On the BASC-2, weak correlations were measured between total scores on the Module 3-
OSA and RSA and parent and teacher ratings on the Anxiety subscale, which was consistent with
predictions. However, inconsistent with predictions, weak relationships were also measured
between ADOS-G total scores, across algorithms, and parent and teacher ratings on the BASC-2
Atypicality and Withdrawal subscales.
Correlations were also calculated between parent and teacher ratings on the GARS-2 and
BASC-2, and total scores for the Module 3-OSA and RSA using the Updated Scoring
Algorithms. Inconsistent with hypotheses, weak relationships were measured between Module 3
88
Updated OSA and RSA total scores, and parent and teacher ratings on the GARS-2, across all
subscales and the Autism Index. Weak relationships were also measured between parent and
teacher ratings on the BASC-2, across subscales, and Module 3 Updated OSA and RSA total
scores. Although the weak relationships measured on the BASC-2 Anxiety subscale was
consistent with hypotheses, those on the BASC-2 Atypicality and Withdrawal subscales were
inconsistent with hypotheses.
Hypothesis 3: Comparisons of Diagnostic Accuracy Indicators across Scoring Algorithms
The purpose of Hypothesis 3 is to compare the diagnostic accuracy of scores obtained
with the OSA and RSA. It was hypothesized that participants’ scores on the RSA would result in
greater diagnostic accuracy than those on the OSA.
Receiver Operating Characteristic (ROC) curve analysis was conducted across modules,
scoring algorithms, and ADOS-G classification determinations (i.e., ASD vs. No Spectrum
Disorder, and Non-Autism ASD vs. Autistic Disorder) to determine the sensitivity (i.e., the
percentage of individuals that have a clinical diagnosis of Autistic Disorder/ASD that accurately
score above the Autistic Disorder/ASD cut-scores on the ADOS-G) and specificity (i.e., the
percentage of individuals without a clinical diagnosis of Autistic Disorder/ASD that accurately
score below the cut-scores for Autistic Disorder/ASD on the ADOS-G) of ADOS-G diagnostic
classifications. ROC plots portray sensitivity and specificity levels for a measure, which are
determined by examining the area under the curve (AUC; Strik, Honig, Lousberg, & Denollet,
2001). Simon (1999) suggested the following interpretation of AUC values: 0.50 to 0.75 (Fair);
0.75 to 0.92 (Good); 0.92 to 0.97 (Very Good); 0.97 to 1.00 (Excellent). Values are compared to
a null hypothesis of a “true area” equivalent to 0.50. Thus, AUC significance indicates that
sensitivity and specificity values statistically differ from random assignment. Positive predictive
89
power (i.e., the percentage of individual scoring above the Autistic Disorder/ASD cut-scores on
the ADOS-G that also have a clinical diagnosis of Autistic Disorder/ASD), negative predictive
power (i.e., the percentage of individuals scoring below the Autistic Disorder/ASD cut-scores
that do not have a clinical diagnosis of Autistic Disorder/ASD), and the hit rate (proportion of
accurate positive and negative classification) also were calculated for each module across scoring
algorithms and classification comparisons.
Results for comparisons of diagnostic accuracy across the OSA and RSA are first
presented. Comparisons of diagnostic accuracy using the Updated Scoring Algorithms are then
reported.
Original and Revised Scoring Algorithm Comparisons. Indicators of diagnostic
accuracy (i.e., ROC plot AUC values, specificity, sensitivity, positive predictive values, negative
predictive values, and hit rates) for participants’ total scores on the ADOS-G for Module 1,
Module 2, and Module 3 across scoring algorithms, using the OSA and RSA, are presented in
Table 17.
ASD vs. no spectrum disorder comparisons. AUC values for each of the modules across
scoring algorithms are greater than or equal to .50, suggesting that the sensitivity and specificity
values obtained are not simply the result of random assignment. Based on Simon’s (1999)
interpretation criteria, the overall diagnostic accuracy of the Module 1-OSA is Fair, whereas the
overall diagnostic accuracy of the Module 1-OSA, Module 2-OSA and RSA, and Module 3-OSA
and RSA is Good. AUC values are higher for the RSA than the OSA for Modules 1 and 3, and
higher for the OSA than the RSA for Module 2.
Substantial differences were not observed between the sensitivity, specificity, positive
predictive values, negative predictive values, and hit rates obtained from applying the Original
90
Table 17 AUC Values, Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Original and Revised Scoring Algorithms AUC Values
Sensitivity Specificity PPV NPV
Hit Rate
OSA RSA OSA RSA OSA RSA OSA RSA OSA RSA OSA RSA
Autism Spectrum Disorder vs. No Spectrum Disorder (N = 400)
Module 1 .733 .849 1.00 1.00 .75 .75 .92 .93 1.00 1.00 .93 .94
Module 2 .839 .806 .89 .89 .62 .69 .78 .81 .78 .80 .78 .81
Module 3 .787 .799 .96 .91 .44 .49 .77 .78 .85 .78 .79 .77
Non-Autism ASDa vs. Autistic Disorder (N = 248)
Module 1 .671 .332 .90 N/Ab .17 N/Ab .65 N/Ab .50 N/Ab .63 N/Ab
Module 2 .674 .697 .95 1.00 .05 0 .49 .49 .50 0 .49 .49
Module 3 .675 .690 .96 .98 .23 .09 .33 .31 .93 .91 .44 .35
Note. Sens = sensitivity. Spec = specificity. PPV = positive predictive value. NPV = negative predictive value. HR = hit rate. AUC = Area under the curve. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. ASD = Autism Spectrum Disorder. Autism Spectrum Disorders include diagnoses of Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified (PDD NOS), and Asperger’s Disorder. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from the analyses of diagnostic accuracy. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only. In addition, references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. aNon-Autism ASD = PDD NOS and Asperger’s Disorder. bValues were not calculated because the AUC value indicated that the Module 1-RSA is not more effective than chance at differentiating between Non-Autism ASDs and Autistic Disorder.
91
and the Revised Scoring Algorithms for each of the three modules. Sensitivity values remained
consistent across scoring algorithms for Modules 1 and 2, and decreased slightly for Module 3
when the RSA was applied. Specificity values also remained consistent across scoring
algorithms for Module 1, and increased slightly for Modules 2 and 3 with the RSA. Positive
predictive values using the RSA were also slightly higher for each of the three modules than
those obtained from the OSA. Further, application of the RSA resulted in higher negative
predictive values and hit rates than those obtained from the OSA for Modules 1 and 2. However,
the negative predicative values and hit rates obtained from applying the OSA to Module 3 were
higher than those obtained from the RSA.
Non-Autism ASD vs. Autistic Disorder comparisons. AUC values for the Module 1-
OSA and for the OSA and RSA for Modules 2 and 3 are greater than .50, suggesting that the
sensitivity and specificity values obtained are not simply the result of random assignment.
However, the AUC value for the Module 1-RSA is less than .50, which suggests that scores from
this algorithm are not accurately differentiating diagnosis better than would be expected by
chance. Based on Simon’s (1999) interpretation criteria, the overall diagnostic accuracy of the
Module 1-OSA, Module 2-OSA and RSA, and Module 3-OSA and RSA are Fair. Because of the
inadequate AUC value, further comparisons were not made between the differential diagnostic
accuracy of the Module 1-OSA and RSA. For Modules 2 and 3, the AUC values are higher for
the RSA than the OSA.
Use of the RSA, as compared to the OSA, results in higher levels of sensitivity and
negative predictive values across Modules 2 and 3. However, the RSA also consistently results in
lower specificity values across modules. In addition, positive predictive values and hit rates are
equivalent across algorithms for Module 2, and lower with the RSA in Module 3.
92
Updated Scoring Algorithms and Optimal Cut-Score Comparisons. ROC analyses
were also conducted to determine the overall diagnostic accuracy of ADOS-G total scores using
the Updated Scoring Algorithms (i.e., the scoring algorithms based on the item structure
identified from the factor analyses conducted in the current sample with recommended item
deletions for Module 1-OSA, Module 2-RSA, and Module 3-OSA and RSA) and to identify
appropriate cut-scores for the Updated Scoring Algorithms. In addition, ROC plots were
reviewed for the retained algorithms (i.e., Module 1-RSA and Module 2-OSA retained consistent
with authors’ recommendations) to identify optimal cut-scores (i.e., those maximizing sensitivity
and specificity). AUC values for the Updated Scoring Algorithms and optimal cut-scores for the
Updated and Retained Scoring Algorithms are presented in Table 18. Table 19 presents the
specificity, sensitivity, positive predictive values, negative predictive values, and hit rates of
ADOS-G scores from the Updated and Retained Scoring Algorithms.
AUC values for each of the Updated Scoring Algorithms across ADOS-G classification
comparisons (i.e., ASD vs. No Spectrum Disorder and Non-Autism ASD vs. Autistic Disorder)
are greater than .50, suggesting that the Updated Algorithms are better than chance at
differentiating participants. Based on Simon’s (1999) interpretation criteria, when differentiating
participants with ASDs from those without, the Updated Algorithms demonstrate Fair (Module 1
Updated OSA) to Good (Module 2 Updated RSA and Module 3 Updated OSA and RSA)
diagnostic accuracy. When used for diagnosis differentiation between participants with Non-
Autism ASD’s and those with Autistic Disorder, the Update Algorithms are demonstrating Fair
overall diagnostic accuracy.
Sensitivity values for each of the Updated and Retained Scoring Algorithms exceed .75
and were determined to be adequate based on the standards recommended (i.e., sensitivity > .70)
93
Table 18
Updated Algorithm AUC Values and Optimal Cut-Scores for the ADOS-G Updated and Retained Scoring Algorithms
Updated Algorithm AUC Values
Optimal Cut-Scores
OSA RSA
OSA RSA
ASD vs. No Spectrum Disorder (N = 400)
Module 1 .720b N/A 6b 7c
Module 2 N/A .814b 9c 9b
Module 3 .773b .799b 8b 8b
Non-Autism ASDa vs. Autistic Disorder (N = 248)
Module 1 .683b N/A 15b N/Aa
Module 2 N/A .700b 14c 12b
Module 3 .661 .689 12b 13b
Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic; AUC = Area under the curve value. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from the analyses of diagnostic accuracy. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only. In addition, references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. aScores were not calculated for these conditions because the AUC value indicated that the sensitivity and specificity values were not significantly different from the Null Hypothesis. bValues obtained for the Updated Scoring Algorithm. cValues are from retained scoring algorithms (i.e., consistent with authors’ [Lord et. al, 1999; Gotham et al., 2007] recommendations).
94
Table 19
Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Updated and Retained Scoring Algorithms Sensitivity Specificity PPV NPV
Hit Rate
OSA RSA OSA RSA OSA RSA OSA RSA OSA RSA
Autism Spectrum Disorder vs. No Spectrum Disorder (N = 400)
Module 1 1.00b 1.00c .52b .75c .84b .93c 1.00b 1.00c .87b .94c
Module 2 .80c .89b .66c .77b .80c .85b .79c .82b .79c .84b
Module 3 .87b .88b .53b .64b .79b .83b .67b .74b .76b .80b
Non-Autism ASDa vs. Autistic Disorder (N = 248)
Module 1 .76b N/Ad .44b N/Ad .79b N/Ad .44b N/Ad .66b N/Ad
Module 2 .84c .94b .40c .19b .57c .54b .73c .75b .62c .56b
Module 3 .81b .89b .33b .33b .35b .37b .79b .88b .48b .50b
Note. PPV = positive predictive value. NPV = negative predictive value. HR = hit rate. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from the analyses of diagnostic accuracy. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only, and references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. aNon-Autism ASD = PDD NOS and Asperger’s Disorder. bValues obtained for the Updated Scoring Algorithm. cValues are from retained scoring algorithms (i.e., consistent with authors’ [Lord et. al, 1999; Gotham et al., 2007] recommendations) using optimal cut-scores. d Values were not calculated because the AUC value indicated that the Module 1-RSA is not more effective than chance at differentiating between Non-Autism ASDs and Autistic Disorder.
95
by Matthey & Petrovski (2002). However, specificity values are consistently lower than
recommended (i.e., specificity > .80; Matthey & Petrovski).
Hypothesis 4: Diagnostic Accuracy of Independent Clinical Diagnoses
Measures of diagnostic accuracy (i.e., sensitivity, specificity, positive predictive values,
negative predictive values, and hit rates) were also calculated to determine the overall diagnostic
accuracy of ADOS-G scores (obtained using both the Original Scoring Algorithm, and the
Updated OSA and Retained OSA with optimal cut-score) when compared to practitioners’
clinical diagnoses made with and without results from participants’ performance on the ADOS-
G. It was hypothesized that greater diagnostic accuracy of ADOS-G scores would be observed
when scores are compared to clinical diagnoses made with knowledge of participants’ ADOS-G
classification (i.e., No Spectrum Disorder, Non-Autism ASD, or Autistic Disorder) as compared
to those made without information regarding ADOS-G performance.
Inter-rater agreement between end dichotomous diagnostic classifications (i.e., the
participant does or does not appear to exhibit an Autism Spectrum Disorder [ASD], including
Autistic Disorder) and differential diagnosis determinations (Non-Autism ASD vs. Autistic
Disorder) initially were calculated using kappa coefficients for the 100 participants included in
the analyses. Excellent inter-rater agreement (k = .77; Cicchetti, 1994) was demonstrated
between the diagnostic classifications of initial clinicians and independent reviewers; whereas
fair (k = .49) inter-rater agreement was demonstrated between differential diagnosis
determinations.
Table 20 presents indicators of diagnostic accuracy for ADOS-G scores from the Original
Scoring Algorithm against clinical diagnoses made with and without participants’ classification
determinations on the ADOS-G. In general, inconsistent with hypotheses, ADOS-G scores
96
Table 20 Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores From the Original Scoring Algorithm Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100) Sensitivity Specificity PPV NPV
Hit Rate
With W/O With W/O With W/O With W/O With W/O
Autism Spectrum Disorder vs. No Spectrum Disorder (N = 100)
Module 1 1.00 1.00 .75 .60 .92 .83 1.00 1.00 .93 .87
Module 2 1.00 1.00 .77 .77 .82 .82 1.00 1.00 .89 .89
Module 3 .88 .94 .46 .54 .70 .74 .73 .87 .71 .78
Non-Autism ASDa vs. Autistic Disorder (N = 58)
Module 1 1.00 1.00 0 0 .64 .90 0 0 .64 .90
Module 2 1.00 1.00 0 0 .50 .57 0 0 .50 .57
Module 3 1.00 1.00 .33 .27 .27 .24 1.00 1.00 .47 .41
Note. With = Clinical diagnosis made with knowledge of the participants’ performance on the ADOS-G. W/O = Clinical diagnosis made without knowledge of the participants’ performance on the ADOS-G. PPV = positive predictive value. NPV = negative predictive value. ASD = Autism Spectrum Disorder. Autism Spectrum Disorders include diagnoses of Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified (PDD NOS), and Asperger’s Disorder. aNon-Autism ASD = PDD NOS and Asperger’s Disorder.
97
across modules and classification comparisons (i.e., ASD vs. No Spectrum Disorder and Non-
Autism ASD vs. Autistic Disorder demonstrate similar levels of diagnostic accuracy when
compared to clinical diagnoses made with and without information regarding ADOS-G
performance.
Table 21 presents indicators of diagnostic accuracy for ADOS-G scores from the Updated
and Retained Scoring Algorithms again clinical diagnoses made with and without participants’
classification determinations on the ADOS-G. Consistent with expectations, variability is
observed between indicators of diagnostic accuracy across clinical diagnoses made with and
without ADOS-G performance. However, inconsistent with predictions, diagnoses made with
information regarding participants’ performance on the ADOS-G do not consistently result in
better diagnostic accuracy than do those diagnoses made without ADOS-G information.
98
Table 21 Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores From the Updated and Retained Original Scoring Algorithms Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100) Sensitivity Specificity PPV NPV
Hit Rate
With W/O With W/O With W/O With W/O With W/O
Autism Spectrum Disorder vs. No Spectrum Disorder (N = 100)
Module 1b 1.00 1.00 .92 .60 .92 .83 1.00 1.00 .93 .87
Module 2c 1.00 1.00 .77 .77 .82 .82 1.00 1.00 .89 .89
Module 3b .71 .82 .58 .75 .71 .82 .58 .75 .66 .79
Non-Autism ASDa vs. Autistic Disorder (N = 58)
Module 1b .29 .44 .25 1.00 .40 1.00 .17 .17 .27 .45
Module 2c 1.00 .88 .43 .17 .64 .58 1.00 .50 .71 .57
Module 3b .50 1.00 .64 .55 .21 .38 .70 1.00 .42 .64
Note. With = Clinical diagnosis made with knowledge of the participants’ performance on the ADOS-G. W/O = Clinical diagnosis made without knowledge of the participants’ performance on the ADOS-G. PPV = positive predictive value. NPV = negative predictive value. ASD = Autism Spectrum Disorder. Autism Spectrum Disorders include diagnoses of Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified (PDD NOS), and Asperger’s Disorder. aNon-Autism ASD = PDD NOS and Asperger’s Disorder. bValues obtained for the Updated Scoring Algorithm. cValues are from Retained Scoring Algorithm (i.e., consistent with authors’ [Lord et. al, 1999; Gotham et al., 2007] recommendations) using optimal cut-scores.
99
Chapter 4. Discussion
Although currently considered a “gold-standard” (Kline-Tasman, Risi, & Lord, 2007) in
the diagnostic assessment of Autism and widely used across clinical and educational settings,
few independent studies have been conducted to date regarding the psychometric properties of
the Autism Diagnostic Observation Schedule-Generic (ADOS-G; Lord et al., 1999). As such, the
purpose of this study was to examine several lines of validity evidence (internal structure,
relationships with other variables, and diagnostic accuracy) for scores obtained from the ADOS-
G.
Hypothesis 1 predicted that, across modules, items included in the Original Scoring
Algorithm would reflect a uni-dimensional construct, whereas items included in the Revised
Scoring Algorithm would reflect two constructs across modules. Exploratory Factor Analysis
(EFA) was conducted to examine the structural validity of ADOS-G Modules 1, 2, and 3 using
both the Original Scoring Algorithm (OSA) and the Revised Scoring Algorithm. Hypothesis 1
was supported for the OSA across modules, but was not supported for the RSA across modules.
Hypothesis 2 predicted that scores on the ADOS-G would demonstrate moderate to
strong relationships with scores from other measures of autistic behavior and weaker
relationships with other measures of behavioral functioning. Correlational analyses were
conducted to examine the relations between participants’ total scores on the ADOS-G, across
scoring algorithms, with parent and teacher ratings of participants’ behavior on the GARS-2 and
the on select subscales of the BASC-2. Hypothesis 2 was not consistently supported across
modules and scoring algorithms.
Hypothesis 3 predicted that, across modules, scores obtained from the Revised Scoring
Algorithm would demonstrate greater diagnostic accuracy than scores obtained from the Original
Scoring Algorithm. AUC values and indicators of diagnostic accuracy were reviewed across
100
modules and scoring algorithms to determine if differences exist between the Original and the
Revised Scoring Algorithms. Hypothesis 3 was partially supported across modules.
Hypothesis 4 predicted that greater diagnostic accuracy of ADOS-G scores would be
observed when scores were compared to clinical diagnoses made with participants’ results on the
ADOS-G as compared to those made without ADOS-G information. Hypothesis 4 was not
consistently supported across modules.
Structural Validity Evidence
Module 1. EFA confirmed a one-factor structure for items included within the OSA.
Items that loaded saliently on the one-factor solution primarily assess aspects of nonverbal social
communication (e.g., use of gesturing, eye contact, and the directing facial expressions towards
others), which examinees may or may not pair with vocalizations. Only one item (A-2) distinctly
reflects a participant’s attempts at verbal communication. Based on item content, the extracted
factor was labeled Social Communication. Item A-5 did not load on the one-factor solution. In
addition, item-total statistics indicate that the same item is modestly detracting from overall scale
reliability for Module 1. As such, this item should not be included in the OSA for Module 1. The
resulting 11-item scale has been entitled the Module 1 Updated OSA.
Because Lord et al. (1999) did not provide specific information regarding the
factorability of ADOS-G items in the original sample, it is unclear if a similar pattern was
observed by the authors. However, Lord et al. reported that “almost all items” (p.116) loaded on
the one-factor solution for each module, suggesting that one or more items also failed to load
saliently on the extracted factors in the original sample.
A two-factor solution was retained for the RSA. Examination of salient item loadings on
each of the factors in the two-factor solution revealed that the factors obtained in this study
101
retained similar items as the factors obtained by Gotham et al. (2007). Items retained on Factor 1
are similar to the items included in the Module 1-OSA and assess aspects of verbal and
nonverbal social communication (Social Communication factor); whereas the items retained on
Factor 2 reflect the participants’ engagement in stereotyped and repetitive behaviors (e.g.,
stereotyped use of words and hand/finger mannerisms; Stereotyped/Repetitive Behavior [SRB]
factor).
Module 2. EFA confirmed a one-factor structure for items included within the OSA.
Items that loaded saliently on the Module 2-OSA one-factor solution assess aspects of nonverbal
(e.g., use of gesturing, pointing, eye contact, and the direction of facial expressions towards
others) and verbal (i.e., engagement in reciprocal social communication and conversation) social
communication. Thus, the extracted factor was also labeled Social Communication. All items
loaded saliently on the retained factor and are contributing to the overall scale reliability, which
suggests that each item should be retained in the scale.
A one-factor solution also was retained for the Module 2-RSA. Retained items assess a
combination of social, communication, and stereotyped repetitive behaviors. However, all items
reflect aspects of the Autistic Disorder diagnostic criteria set forth by the DSM-IV-TR
(American Psychiatric Association, 2004). As such, the factor was labeled Autistic
Characteristics. Although inconsistent with hypotheses and Gotham et al.’s (2007) results,
current results are consistent with the results of Gotham et al.’s (2008) reexamination of the
RSA, in which the authors questioned the suitability of a two-factor solution for the Module 2
RSA.
One item (Item D-2, which assesses hand/finger mannerisms) did not load saliently on
the one-factor solution for the Module 2-RSA. Because item-total statistics also suggest that the
102
inclusion of Item D-2 in the factor solution is resulting in a mild decrease in overall scale
reliability, it is recommended that the item be excluded from the Module 2-RSA. The resulting
13-item scale (Module 2 Updated RSA) was used to test subsequent hypotheses.
Module 3. EFA confirmed a one-factor structure for items included within the OSA. The
items that loaded saliently on the one-factor solution primarily assess aspects of nonverbal social
communication (e.g., use of gesturing, eye contact, and the directing facial expressions towards
others) and verbal social communication (e.g., reporting of events, conversation, reciprocal
social interactions). As such, the extracted factor was labeled Social Communication. Item A-4
did not load on the one-factor solution. In addition, item-total statistics indicated that the same
item is modestly detracting from overall scale reliability for Module 3. As such, this item should
not be included in the OSA for Module 3. The resulting 10-item scale was entitled the Module 3
Updated OSA.
A one-factor solution also was retained for the Module 3-RSA.A review of the items
retained on the one-factor solution indicates that the retained items assess a combination of
social, communication, and stereotyped repetitive behaviors, and all items reflect aspects of the
Autistic Disorder diagnostic criteria set forth by the DSM-IV-TR (American Psychiatric
Association, 2004). As such, the factor was labeled Autistic Characteristics.
Two items (Items D-1 and D-2) did not load saliently on the one-factor solution for the
Module 3-RSA, and item-total statistics indicate that the deletion of these items from the scale
would increase scale reliability. Based on this information, items D-1 and D-2 should be
removed from the Module 3-RSA. The resulting scale (Module 3 Updated RSA was used to test
subsequent hypotheses.
103
Convergent and Discriminant Validity Evidence
To examine evidence of the convergent and discriminant validity of ADOS-G scores,
participants’ total scores obtained on the ADOS-G, using both the OSA and RSA, were
correlated with parent and teacher ratings of participants’ behavior on the GARS-2 and BASC-2.
Relationships also were measured between total scores on the Updated Scoring Algorithms and
respondents’ ratings on the GARS-2 (for Module 3) and the BASC-2. Inconsistent with
expectations, moderate to strong relationships were not consistently observed between ADOS-G
scores (OSA or RSA) and other measures of autistic behavior. Use of the Updated Scoring
Algorithms did not yield stronger relationships. In fact, although some moderate correlations
were observed between parent and teacher ratings on the Atypicality and Withdrawal subscales
on the BASC-2 and total scores on the Module 1 and 2 OSA and RSA, only weak relationships
were observed using the Updated Scoring Algorithms.
The ADOS-G is a very unique instrument and, to date, is the only direct assessment of
Autism Spectrum Disorders in wide-spread use. As such, it is difficult to obtain appropriate
instruments against which to consistently compare the ADOS-G in order to obtain evidence of
convergent validity. Convenience selections used for this study (i.e., the GARS-2 and Atypicality
and Withdrawal subscales from the BASC-2) have some evidence to support their use as
measures of autistic functioning (Gilliam, 2006; Reynolds & Kamphaus, 2004). However, scores
from the ADOS-G across scoring algorithms and modules, did not consistently demonstrate
moderate to strong relationships with scores on the GARS-2 or the BASC-2. This inconsistency
could exist for several possible reasons. First, based on the questionable evidence of the
structural validity of the GARS-2 (Pandolfi et al., 2010), the GARS-2 subscales and the resulting
Autism Index, may not be consistently measuring the intended constructs. In addition, the small
104
sample size available for the Module 1 and 2 comparisons may be resulting in inaccurate
demonstrated relationships.
Another, and perhaps more, plausible explanation for inconsistencies between ADOS-G
total scores and parent and teacher ratings on the GARS-2 and BASC-2, however, is related to
differences between the instruments. The GARS-2 and BASC-2 are behavioral rating scales and
ratings are based on parents’ and teachers’ perceptions of a child’s typical behavioral
functioning. In contrast, the ADOS-G is a standardized direct assessment of behavior and is
scored by a trained observer based on a participant’s engagement or lack of engagement in
specific behaviors observed only during the ADOS-G administration. Although each reportedly
assesses autistic behavior, differences in scores may be related to differences in raters’
knowledge and awareness of autistic behaviors and/or the differences in length of opportunity in
which to observe behaviors. Further, parent and teacher ratings on the GARS-2 and BASC-2
may be influenced by desire for a specific outcome (i.e., over-reporting concerns due to a desire
for special education eligibility, or under-reporting concerns, due to the undesirability of a
special education eligibility), whereas scores on the ADOS-G are theoretically objective in
nature. However, total scores on the ADOS-G may also be influenced by examiner bias.
Particularly in the case of re-evaluations, in which examiners are aware of an examinee’s current
ASD diagnosis, administrators may be biased towards or against “observing” the behavioral
characteristics under investigation.
Consistently weak relationships observed between parent and teacher ratings on the
Anxiety subscale of the BASC-2 and participants’ total scores on the ADOS-G across modules
and all scoring algorithms, provide evidence of the discriminant validity of scores on the ADOS-
G.
105
Evidence of Diagnostic Accuracy
Module 1. To compare diagnostic accuracy across the OSA (Lord et al., 1999) and RSA
(Gotham et. al, 2007), AUC values were first interpreted. When making determinations between
ASDs and No Spectrum Disorder, AUC values indicate that the RSA results in greater overall
diagnostic accuracy than the OSA. When making determinations between Non-Autism ASDs
and Autistic Disorder, failure to reject the Null Hypothesis for the RSA indicates that scores
from the RSA are not accurately differentiating diagnoses better than would be expected by
chance, which suggests that it’s use is uninformative. However, similar concerns were not
observed for the OSA. Thus, the RSA is not resulting in better differential diagnostic accuracy
for Module 1 than the OSA.
In addition to the AUC values, specific indicators of diagnostic accuracy were compared
across scoring algorithms and were measured to be virtually identical across the OSA and RSA
when differentiating between all Autism Spectrum Disorders (ASD) and No Spectrum Disorder
for Module 1. Because results of the ROC analysis indicate that the RSA is not differentiating
participants with Non-Autism ASDs from those with Autistic Disorder beyond chance, further
comparative interpretations were not conducted.
The diagnostic accuracy of the Module 1 Updated OSA also was examined and compared
to that of the standard Module 1-OSA. The AUC value for the Module 1 Updated OSA is
slightly lower than the AUC value for the standard Module 1-OSA when differentiating between
ASDs and No Spectrum Disorders, suggesting that the Module 1 Updated OSA demonstrates
slightly lower overall diagnostic accuracy for this condition. However, the AUC value for the
Updated Algorithm suggests that the Updated OSA demonstrates slightly higher overall
diagnostic accuracy than that of the standard OSA for this condition.
106
Other specific indicators of diagnostic accuracy also were reviewed. When
differentiating between participants with an ASD and those with No Spectrum Disorder, the
Module 1 Updated OSA demonstrates consistent levels of sensitivity and NPV, but lower
specificity, PPV, and hit rates than the standard OSA. When differentially diagnosing
participants with Non-Autism ASD’s from those with Autistic Disorder, the Module 1 Updated
OSA demonstrates moderately lower levels of sensitivity and NPV, but moderately higher levels
of specificity, PPV, and hit rates than the standard Module 1-OSA. In fact, the Updated OSA for
this condition presents a better balance between sensitivity and specificity than does the standard
OSA.
Differences between the Retained Module 1-RSA (i.e., updated by applying optimal cut-
scores to maximize sensitivity and specificity) and the standard Module 1-RSA were also made
for the ASD vs. No Spectrum Disorder condition. Sensitivities, specificities, PPVs, NPVS, and
hit rates are virtually identical across the two scoring algorithms, which suggests that there are no
significant difference between the Retained and the standard Module 1-RSA.
Indicators of diagnostic accuracy also were reviewed to determine if values obtained
from the current sample using the OSA and RSA are consistent with the sensitivity and
specificity values originally obtained by the authors of the OSA (Lord et al., 1999;) and the RSA
(Gotham et al., 2007). Sensitivity values for the Module 1 OSA from the current sample are
relatively consistent (see Table E2 in the Appendix) with those reported by Lord et al. (1999)
and adequate for diagnostic tests (Matthey & Petrovski, 2002). Specificity values for Module 1
from the current sample, however, are substantially lower than those reported by Lord et al. and
inadequate for diagnostic tests (Matthey & Petrovski). Consistent with the pattern of results for
the OSA, sensitivity values (see Table E3 in Appendix E) obtained from the current sample for
107
the RSA are adequate for diagnostic tests and slightly higher than those reported by Gotham et
al. However, also consistent with the data obtained for the OSA, specificity values obtained for
the current sample are inadequate for diagnostic tests and substantially lower than those reported
by test authors.
Module 2. AUC values also were interpreted for Module 2 across the OSA, RSA, and
classification determinations (i.e., ASD vs. No Spectrum Disorder, Non-Autism ASD vs.
Autistic Disorder). Based on the AUC values, scores from the OSA result in slightly better
diagnostic accuracy than scores from the RSA when differentiating participants with ASDs from
those with No Spectrum Disorder. However, scores from the RSA are resulting in slightly better
diagnostic accuracy than are scores from the OSA when differentiating participants with Non-
Autism ASDs from those with Autistic Disorder.
Specific indicators of diagnostic accuracy also were compared across the OSA and RSA
for Module 2. The specificity value, PPV, NPV, and hit rates are slightly higher for the RSA than
the OSA when differentiating between Autism Spectrum Disorders (ASD) and No Spectrum
Disorder. However, when engaging in differential diagnosis, use of the RSA results in a higher
level of sensitivity, consistent PPVs and hit rates, and lower levels of specificity and NPVs than
the OSA.
The diagnostic accuracy of the Module 2 Updated RSA also was examined and compared
to that of the standard Module 2-RSA. The AUC value for the Module 2 Updated RSA is slightly
higher than the AUC value for the standard RSA when differentiating between ASDs and No
Spectrum Disorders, and Non-Autism ASDs and Autistic Disorder, suggesting that the Module 2
Updated RSA demonstrates slightly higher overall diagnostic accuracy. When differentiating
between participants with an ASD and those with No Spectrum Disorder, the Module 2 Updated
108
RSA demonstrates consistent levels of sensitivity and higher levels of specificity, PPV, NPV,
and hit rates than the standard RSA. When differentially diagnosing participants with Non-
Autism ASD’s from those with Autistic Disorder, the Module 2 Updated RSA demonstrates a
slightly lower level of sensitivity, but modest to moderate improvements in specificity, PPV,
NPV, and hit rates than the standard Module 2-RSA.
Comparisons also were made between the Retained Module 2-OSA (i.e., updated by
applying optimal cut-scores to maximize sensitivity and specificity) and the standard Module 2-
OSA. For the ASD vs. No Spectrum Disorder condition, the Retained OSA exhibits a slightly
lower level of sensitivity but slightly higher levels of specificity, PPVs, NPVS, and hit rates than
does the standard Module 2-OSA. When differentially diagnosing participants with Non-Autism
ASDs from those with Autistic Disorder, the Retained Module 2-OSA also demonstrates a
moderate decrease in sensitivity but moderate increases in specificity, PPV, NPV, and hit rates as
compared to the standard OSA. In general, the Retained Module 2-OSA for this condition
presents a better balance between sensitivity and specificity than does the standard OSA.
Indicators of diagnostic accuracy for Module 2 also were reviewed to determine if values
obtained from the current sample using the OSA and RSA are consistent with the sensitivity and
specificity values originally obtained by the authors of the OSA (Lord et al., 1999) and the RSA
(Gotham et al., 2007). Sensitivity values for the Module 2 - OSA from the current sample (see
Table E2) are adequate for diagnostic tests (Matthey & Petrovski, 2002) and relatively consistent
with those reported by the test authors (Lord et al.). However specificity values for Module 2
from the current sample are substantially lower than those reported by test authors (Lord et al.)
and inadequate for diagnostic tests (Matthey & Petrovski). Consistent with the pattern of results
for the OSA, sensitivity values (Table E3) obtained from the current sample for the RSA are
109
adequate for diagnostic tests and consistent with those reported by Gotham et al. However, also
consistent with the data obtained for the OSA, specificity values obtained for the current sample
are inadequate for diagnostic tests and substantially lower than those reported by test authors
(Gotham et al.)
Module 3. AUC values were also interpreted for Module 3 across the OSA and RSA and
diagnostic comparisons (i.e., ASD vs. No Spectrum Disorder, Non-Autism ASD vs. Autistic
Disorder). Based on the AUC values, scores from the RSA are resulting in slightly better
diagnostic accuracy than are scores from the OSA across comparisons.
Indicators of diagnostic accuracy for Module 3 also were compared across the OSA and
RSA. When differentiating between an Autism Spectrum Disorder (ASD) and No Spectrum
Disorder, use of the RSA results in modest increases in specificity and PPV, but modest
decreases in sensitivity, NPV, and hit rate. Similarly, when engaging in differential diagnosis
(i.e., differentiating between Non-Autism ADSs and Autistic Disorder) use of the RSA results in
modest increases in sensitivity but also modest decreases in PPY, NPV, and hit rate, and
moderate decreases in specificity as compared to the OSA.
The diagnostic accuracy of the Module 3 Updated OSA and RSA were also examined
and compared to that of the standard Module 3-OSA and RSA. The AUC values for the Module
3 Updated OSA are slightly lower than the AUC value for the standard OSA when
differentiating between ASDs and No Spectrum Disorders and Non-Autism ASDs and Autistic
Disorder, suggesting that the standard Module 3-OSA demonstrates slightly higher overall
diagnostic accuracy. The AUC value for the Module 3 Updated RSA was consistent with that of
the standard RSA when differentiating between individuals with ASDs from No Spectrum
110
Disorders, and slightly lower than that of the standard RSA when differentially diagnosing
participants with Non-Autism ASDs and those with Autistic Disorder.
When differentiating between participants with an ASD and those with No Spectrum
Disorder, the Updated Module 3-OSA and RSA both demonstrate modest decreases in sensitivity
and NPVs, but modest to moderate increases in specificity and PPV when compared to the
standard OSA and RSA. When differentially diagnosing participants with Non-Autism ASD’s
from those with Autistic Disorder, the Updated Module 3-OSA and RSA again demonstrate a
mildly to moderately lower level of sensitivity and NPVs than the standard OSA and RSA, but
modest to moderate improvements in specificity, PPV, and hit rates than the standard Module 3-
OSA and RSA.
Indicators of diagnostic accuracy for the standard OSA and RSA Module 3 were
reviewed to determine if values obtained from the current sample are consistent with the
sensitivity and specificity values originally obtained by the authors of the OSA and the RSA.
Sensitivity values for the Module 3- OSA from the current sample are adequate for diagnostic
tests (Matthey & Petrovski, 2002) and consistently higher than those reported by the test authors
(Lord et al.). However specificity values for the standard Module 3-OSA from the current sample
are substantially lower than those reported by test authors (Lord et al.) and determined to be
inadequate for diagnostic tests (Matthey & Petrovski). Consistent with the pattern of results for
the OSA, sensitivity values (Table E3) obtained from the current sample for the RSA are
adequate for diagnostic tests and higher those reported by Gotham et al. However, also consistent
with the data obtained for the OSA, specificity values obtained for the current sample are
inadequate for diagnostic tests and substantially lower than those reported by test authors
(Gotham et al.).
111
Independent Clinical Diagnoses
A limitation consistently identified by many of the previous examinations of the
diagnostic accuracy of ADOS-G scores (de Bildt et al., 2009; Gotham et al., 2007; Gray et al.,
2008) is that the determination of diagnostic accuracy (comparing a participant’s ADOS-G
classification to their resulting clinical diagnosis) often has been confounded by the fact that the
two classifications were not independent because participants’ performance on the ADOS-G
was included as part of the data used to make the clinical diagnosis. In order to address this
limitation, indicators of diagnostic accuracy were calculated to determine the overall diagnostic
accuracy of ADOS-G scores when compared to practitioners’ clinical diagnoses made without
knowledge of participants’ performance on the ADOS-G. It was hypothesized that greater
diagnostic accuracy of ADOS-G scores would be observed when scores are compared to clinical
diagnoses made with participants’ performance information on the ADOS-G as compared to
those made without information regarding ADOS-G performance.
Diagnostic accuracy was calculated between clinical diagnoses made with and without
the ADOS-G and total scores on the ADOS-G obtained from applying the Module 1-OSA,
Module 2-OSA, and Module 3-OSA. A second set of comparisons were made between clinical
diagnoses made with and without ADOS-G information and total scores on the ADOS-G
obtained from applying the Module 1 Updated OSA, the Retained Module 2-OSA (i.e., updated
based on optimal cut-scores designed to maximize the balance between sensitivity and
specificity), and the Module 3 Updated OSA. Comparisons were not made for the Revised
Scoring Algorithm or Updated RSA due to sample restrictions.
In general, consistency is observed between the diagnostic accuracy of decisions made
with and without participants’ ADOS-G scores on the Module 1, 2, and 3-OSAs. Sensitivity
112
values are identical for the ASD vs. No Spectrum Disorder comparisons for Modules 1 and 2 -
OSA, and across all modules for the Non-Autism ASD vs. Autistic Disorders comparisons.
Across diagnostic comparisons, the following identical specificity values are also observed:
Module 2-OSA (ASD vs. No Spectrum Disorder), and Modules 1 and 2-OSA (Non-Autism ASD
vs. Autistic Disorder). Where differences between sensitivity and specificity do exist, clinical
diagnoses made with information from the ADOS-G did not consistently demonstrate higher
levels of diagnostic accuracy.
Across the Module 1 Updated OSA, Retained Module 2 OSA, and Module 3 Updated
OSA indicators of diagnostic accuracy, less consistency was observed. However, contrary to the
hypothesis, diagnostic accuracy was not consistently higher for decisions made with knowledge
of ADOS-G performance. For example, the Module 3 Updated OSA demonstrated higher levels
of sensitivity and consistent or higher levels of specificity for decisions made without ADOS-G
data across classification determinations (i.e., ASD vs. No Spectrum Disorder, and Non-Autism
ASD vs. Autistic Disorder).
In general, these results provide initial evidence that use of a participant’s performance
on the ADOS-G in clinical decisions making does not substantially “over inflate” the reported
diagnostic accuracy of the instrument and, instead, suggest that the methods used in previous
studies likely yielded valid estimates of the diagnostic accuracy of ADOS-G scores.
Summary of Evidence by Module and Scoring Algorithm
Module 1. Results from the current study provide evidence in support of the structural
validity of the Module 1-OSA; however, exploratory factor analysis recommended the deletion
of one item that did not load saliently on the one-factor solution and that was decreasing overall
scale reliability. Based on this recommendation, the Module 1 Updated OSA was created and
113
considered in subsequent analyses. Correlational analyses indicated that total scores on the
Module 1-OSA and Module 1 Updated OSA did not consistently demonstrate moderate to strong
relationships with other measures of autistic behavior. In fact, results from the Module 1 Updated
OSA were less consistent with predictions than were results from the standard OSA. Total scores
from both the standard and the Updated algorithms consistently demonstrated weak relationships
with measures of other behavioral functioning, providing evidence of the discriminant validity of
ADOS-G scores.
Examinations of diagnostic accuracy evidence suggested that the overall diagnostic
accuracy of the standard Module 1-OSA is Fair. Although sensitivity values were high across
comparisons (i.e. differentiating participants with ASDs from those with No Spectrum Disorders,
and those with Non-Autism ASDs from those with Autistic Disorder), specificity values were
also consistently lower than recommended standards. Use of the Module 1 Updated OSA did not
improve the diagnostic accuracy of the Module 1-OSA for determining individuals with and
without ASDs. However, when engaging in differential diagnosis, use of the Updated OSA
resulted in greater overall diagnostic accuracy and a better balance between sensitivity and
specificity.
Exploratory factor analysis also provided evidence of the structural validity of the
Module 1-RSA. A two-factor structure was retained, and all items loaded saliently on one of the
two factors, suggesting that updates to the scoring algorithm are not needed at this time.
Correlational analysis failed to provide consistent evidence for the convergent validity of ADOS-
G scores, but consistent evidence of the discriminant validity of ADOS-G scores was obtained.
The diagnostic accuracy of the Module 1-RSA also was investigated. In general, the
overall diagnostic accuracy of the RSA was determined to be Good and the RSA demonstrated
114
higher levels of overall diagnostic accuracy compared to the standard and Module 1-OSA and
Module 1 Updated OSA for differentiating between participants with and without ASDs.
Similarly, the Module 1-RSA demonstrated high levels of sensitivity but lower than adequate
levels of specificity. Optimal cut-scores were identified for the Retained Module 1-RSA and
applied in an attempt to provide a better balance between sensitivity and specificity. The
accuracy of the resulting Retained Module 1-RSA was not significantly different from that of the
standard RSA at differentiating between those with and without ASDs. However, the Module 1-
RSA was not found to accurately predict the differential diagnosis (i.e., determining if a
participant on the autism spectrum has a Non-Autism ASD or Autistic Disorder) of participants
at a rate higher than would be expected by chance, suggesting that it should not be used for this
purpose.
Module 2. Exploratory factor analysis conducted on the Module 2-OSA provided support
for the structural validity of the module’s one-factor structure. All items loaded saliently on the
one-factor structure, and the scale demonstrated adequate reliability. As such, no updates to the
Module 2-OSA were recommended. Correlational analyses provided evidence for the
discriminant validity of the Module 2-OSA total scores, but consistent evidence of the
convergent validity of ADOS-G scores was not obtained.
The diagnostic accuracy of total scores from the OSA was also examined by
classification comparisons (i.e., ASD vs. No Spectrum Disorder, and Non-Autism ASD vs.
Autistic Disorder). For the ASD vs. No Spectrum Disorder comparisons, the Module 2-OSA
demonstrated Good overall diagnostic accuracy and adequate sensitivity, PPV, NPV, and hit
rates. However, the specificity of the standard OSA was lower than the recommended level for
diagnostic assessment. When differentiating between participants with Non-Autism ASDs and
115
those with Autistic Disorder, the Module 2-OSA demonstrated Fair overall diagnostic accuracy.
Although the sensitivity of the standard OSA was high, inadequate levels of specificity, PPV,
NPV, and hit rates were measured to exist. In an attempt to find a better balance between
sensitivity and specificity, optimal cut-scores were identified for the Module 2-OSA and were
applied, creating the Retained Module 2-OSA. Across diagnostic comparisons, the Retained
OSA resulted in modest decreases to sensitivity and moderate gains to levels of specificity,
PPVs, NPVS, and hit rates than did the standard Module 2-OSA. In general, when compared to
the standard OSA, the Retained Module 2-OSA presented a better balance between sensitivity
and specificity.
Structural validity evidence for the Module 2-RSA was also obtained through factor
analysis. A one-factor solution was determined to best fit the data, although one item did not load
saliently on the one-factor solution. As a result, the Module 2 Updated RSA was created and
considered in subsequent analyses. Correlational analyses indicated that total scores on the
Module 2-RSA and Module 2 Updated RSA did not consistently demonstrate moderate to strong
relationships with other measures of autistic behavior, and results from the Module 2 Updated
RSA were less consistent with predictions than were results from the Module 2-RSA. Total
scores from both the standard and the Updated algorithms consistently demonstrated weak
relationships with measures of other behavioral functioning, providing evidence of the
discriminant validity of ADOS-G scores.
The overall diagnostic accuracy of the Module 2-RSA was determine to be Good for
making ASD vs. No Spectrum Disorder comparisons, and Fair for differential diagnosis on the
autism spectrum. Across diagnostic comparisons, the RSA demonstrated high levels of
sensitivity but inadequate levels of specificity. Use of the Module 2 Updated RSA resulted in
116
slight improvements to overall levels of diagnostic accuracy across diagnostic comparisons. In
addition, use of the Updated RSA consistently resulted in modest to moderate increases in levels
of specificity, PPV, NPV, and hit rates as compared to the standard RSA.
Module 3. Results from the current study provided evidence in support of the structural
validity of the Module 3-OSA; however, exploratory factor analysis recommended the deletion
of one item that did not load saliently on the one-factor solution and that was decreasing overall
scale reliability. Based on this recommendation, the Module 3 Updated OSA was created and
considered in subsequent analyses. Consistent with predictions, correlational analyses indicated
that total scores on the Module 3 OSA consistently demonstrated weak relationships with
measures of other behavioral functioning, providing evidence of the discriminant validity of
ADOS-G scores. However, inconsistent with predictions, total scores on the OSA did not
consistently demonstrate moderate to strong relationships with other measures of autistic
behavior. Use of total scores from the Updated OSA did not result in greater consistency with
predictions.
The diagnostic accuracy of total scores from the OSA was also examined by
classification comparisons (i.e., ASD vs. No Spectrum Disorder, and Non-Autism ASD vs.
Autistic Disorder). For the ASD vs. No Spectrum Disorder comparisons, the Module 3-OSA
demonstrated Good overall diagnostic accuracy and adequate sensitivity, PPV, NPV, and hit
rates. However, the diagnostic accuracy of the Module 3 OSA was Fair for differential diagnosis
of participants on the autism spectrum, and specificity, PPV, and hit rates were inadequate based
on standards for diagnostic assessment (Matthey & Petrovski, 2002). Use of the Module 3
Updated OSA consistently resulted in modest decreases to overall diagnostic accuracy, but
increases to specificity over the standard Module 3 OSA and RSA.
117
Structural validity evidence was also provided for the Module 3-RSA based on factor
analysis. EFA recommended the deletion of two items that did not load saliently on the one-
factor solution and that were decreasing overall scale reliability. As a result of these
recommendations, the Module 3 Updated RSA was created and considered in subsequent
analyses. Inconsistent with predictions, correlational analyses indicated that total scores on the
Module 3-RSA and Module 3 Updated RSA did not consistently demonstrate moderate to strong
relationships with other measures of autistic behavior. Instead of improving the consistency of
measured relationships, results from the Module 3 Updated RSA were less consistent with
predictions than were results from the standard RSA. Total scores from both the standard and the
Updated algorithms consistently demonstrated weak relationships with measures of other
behavioral functioning, providing evidence of the discriminant validity of ADOS-G scores.
The diagnostic accuracy of total scores from the RSA was also examined by
classification comparisons (i.e., ASD vs. No Spectrum Disorder, and Non-Autism ASD vs.
Autistic Disorder). For the ASD vs. No Spectrum Disorder comparisons, the Module 3-RSA
demonstrated Good overall diagnostic accuracy; however, the diagnostic accuracy of the Module
3-RSA was Fair for the differential diagnosis of participants on the autism spectrum. Across
classification comparisons, the overall diagnostic accuracy of the RSA was slightly better than
that of the OSA. Use of the Module 3 Updated RSA resulted in consistent to slightly decreased
overall diagnostic accuracy as compared to the standard RSA. However, use of the Updated RSA
consistently resulted in higher levels of specificity and a better balance between sensitivity and
specificity than did the standard RSA.
118
Clinical Implications
In general, results of the current study confirm the structural validity and overall
diagnostic accuracy of the ADOS-G. However, the current research also highlights some of the
limitations of the ADOS-G. Low measured specificity values across modules and scoring
algorithms indicate that the ADOS-G is systematically over indentifying participants (i.e.,
indicating that students without ASDs are on the Autism Spectrum, and that students with Non-
Autism ASDs have Autistic Disorder). In addition, correlational analyses indicate that scores on
the ADOS-G do not demonstrate expected relationships with other quantitative measures of
autistic behavior. Further, some ADOS-G modules have stronger cumulative evidence to support
their use than others. Specifically, data from the current study suggests that, across scoring
algorithms and differential classifications, Modules 3 consistently demonstrates greater
diagnostic accuracy than does Module 1. This result may be due to limitations with the Module 1
sample size or discrepancies between the utility of the activities designed to elicit the behaviors
under investigation across modules. However, differences may also be a related to the
characteristics of the examinees for whom Module 1 was designed (i.e., young, nonverbal
children, whose limited verbal abilities may give the appearance of an ASD, even though they do
not have the disorder), as compared to the characteristics of examinees who are administered
Module 3 (i.e., older children and adolescents with fluent expressive language abilities). The
functioning of older individuals is more stable than is the functioning of young children, and, in
general, younger children, who often are not yet enrolled in formal schooling, have had far less
exposure to social situations outside of the home than older children and adolescents. This lack
of exposure to other children and adults may be responsible for measured social atypicalities
assessed on the ADOS-G.
119
Despite some limitations, the psychometric strengths of the ADOS-G provide support for
its continued use in school-based psychoeducational evaluations for the diagnosis of students
with Autism Spectrum Disorders. In addition, the qualitative insights obtained through the
administration of the ADOS-G (which were not examined as a part of the current study) are
valuable to clinicians and, at times, are a critical factor when making diagnostic decisions in
daily practice.
Clinicians, however, need to recognize the limitations of the instrument and respond
accordingly. For example, consistent with authors’ (Lord et al., 1999) recommendations, the
ADOS-G should always be administered as part of a multimodal autism assessment and never
used as the solitary criteria for making a clinical diagnosis. Use of other measures and
assessment techniques (e.g., direct observations in a variety of different settings, and completing
structured interviews with parents and teachers) will allow clinicians to determine if performance
on the ADOS-G is consistent with, or discrepant from, a student’s typical functioning at home, at
school, and in the community. The timing of the ADOS-G administration in relation to the
completion of other assessment activities also should be considered, as it could influence a
clinician’s objectivity in their scoring of the ADOS-G. Further, age of the child and the module
administered also need to be considered when weighing the relative importance of a participant’s
performance on the ADOS-G in clinical diagnostic decision-making, especially if the
information obtained on the ADOS-G is discrepant with other data.
Based on the accumulated evidence, it is recommended that clinicians utilize the Updated
Original Scoring Algorithm when scoring an administration of Module 1. The Updated OSA
provides the best balance between sensitivity and specificity across diagnostic comparisons and,
unlike the standard RSA, can be used in the classification of ASDs and in the differential
120
diagnosis between ASDs. Despite its strengths, the Updated OSA still produces lower than
adequate level of specificity. As such, clinicians should be mindful of this limitation and rely on
evidence from the ADOS in conjunction with other evaluation evidence when making a resulting
clinical diagnosis.
Based on the current study, it also is recommended that clinicians utilize the Updated
Revised Scoring Algorithm when scoring an administration of Module 2. Although the Module
2-OSA demonstrates slightly higher levels of overall diagnostic accuracy than the standard and
Updated RSA, particularly for making ASD vs. No Spectrum Disorder comparisons, use of the
Updated RSA provides the best balance between sensitivity and specificity across diagnostic
comparisons. Despite the better balance observed with the Updated RSA, specificity values are
still lower than recommended for diagnostic tests.
Further, evidence from the current study supports the use of the Module 3 Updated
Revised Scoring Algorithm when scoring an administration of the ADOS-G Module 3. The
Updated RSA provides better overall diagnostic accuracy than the standard and Updated OSA,
and a better balance between sensitivity and specificity than the standard RSA. Low specificity,
particular when used for differential diagnosis, is a consistent limitation of all ADOS-G scoring
algorithms in the current sample, the Module 3 Updated RSA included. As reported above,
clinicians need to recognize the limitations of the ADOS and use it as one component of a
multimodal assessment battery.
Limitations
Results must be considered in the context of several limitations. First, the current study
features a convenience sample drawn from one large southwestern school district. In addition,
fewer participants were administered Modules 1 and 2 as compared to Module 3. The small
121
number of participants with data for Modules 1 and 2 became problematic when examining the
Revised Scoring Algorithm, which required further division of participants in each module into
two developmental cells. Once divided, the very small sample sizes for Module 1, No Words (N
= 16), and Module 2, Less Than 5 Years (N = 45), precluded analyses being conducted. In
general, small sample sizes for Modules 1 and 2 may be influencing the results of all of the
analyses conducted and leading to less robust/stable results. Of the analyses conducted, the small
samples were most problematic to the factor analyses of the Revised Scoring Algorithm. For
example, , RSA Module 1 (N = 66) and Module 2 (N = 73) sample sizes were slightly below
Mundfrom, Shaw, and Ke’s (2005) sample size recommendation (N > 90) for factor analysis
with the extraction of 2 factors. However, module sample sizes for the current study were
consistent with those used by Lord et al. (1999) during their examination of the structural
validity of the Original Scoring Algorithm.
Limitations also resulted from the item-scoring method of the ADOS-G. EFA is best
suited to interval data and with a 5-point (Dawis, 1987) or a 7-point (Gorsuch, 1997) response
scale. However, scores on the ADOS-G are ordinal in nature and, after the systematic recoding
of scores of 3 to 2, only 3 response options remained per item. The truncated response scale may
be restricting the range of inter-item correlations and result in the under representation of the
actual relationships between scale items. Despite these limitations, each of the six item
correlation matrices (i.e., Module 1 OSA, Module 1 RSA, Module 2 OSA, Module 2 RSA,
Module 3 OSA, and Module 3 RSA) were adequate for factorability based on criteria set a priori,
so analyses were conducted and interpreted.
As reported in the general discussion of the correlational analyses, a third limitation of
this study involves the inconsistencies between the ADOS-G and the other measures to which
122
ADOS-G scores were compared, especially the GARS-2. Although both the ADOS-G and the
GARS-2 are purported to measure the behavioral characteristic of Autism Spectrum Disorders, it
is questionable if direct observations of behavior made by trained clinicians during a specified
period of time, and parent and teacher perceptions of a child’s “typical” functioning, as is
assessed by the GARS-2, should be considered equivalent. In order to obtain accurate evidence
of the convergent and discriminant validity of ADOS-G scores, ADOS-G scores need to be
compared to other direct assessments of Autistic behavior, also made by trained professionals.
As identified in other studies completed on the diagnostic accuracy of the ADOS (de
Bildt et al., 2009; Gotham et al., 2007; Gray et al., 2008; Overton et al., 2008), the determination
of diagnostic accuracy of the ADOS (as derived by comparing a participant’s ADOS-G
classification to their resulting clinical diagnosis) is confounded by the fact that the two
classifications are not independent: participants’ performance on the ADOS-G was one of the
assessment tools used to make the resulting clinical diagnosis. The obtained estimates of
diagnostic accuracy for the current study may be over inflated as a result of this confound.
However, based on the results of the diagnostic comparisons made with and without data from
the ADOS-G, it appears that this may not be the case.
Finally, there were limitations with the way data were collected for the independent
diagnostic comparisons. Specifically, clinicians were provided with a copy of each participant’s
comprehensive evaluation report (minus ADOS-G scores and diagnostic decisions) and asked to
use the remainder of the report data to determine if the student did or did not meet the criteria for
an Autism Spectrum Disorder. Although ADOS-G data (scores, test session observations) and
final diagnostic determinations were removed from the report, the way in which the evaluation
report was originally written may have been influenced by a participant’s performance on the
123
ADOS, which, in turn, may have influenced the clinical diagnostic decision made “independent”
of the ADOS.
Future Research
Current findings provide directions and questions for future research. First, it is
recommended that the current study be replicated for the generalizability of results. However, an
overall larger sample size, with more consistency between module sample sizes for each of the
scoring algorithms, is also recommended. Future research should use a similar sample-
composition as the one used in the study (i.e., students referred for a school-based
psychoeducational diagnostic evaluation due to the suspicion of Autism/Autism Spectrum
Disorder). However, it would be wise to include participants from multiple school districts in
order to minimize the threat of any systematic differences reflected within participants or
assessment practices in a given school district.
Further analyses of the structural validity of the ADOS-G modules across scoring
algorithms are also recommended. Specifically, confirmatory factor analysis should be
conducted on the retained standard and updated factor structures to provide additional evidence
of the structural validity of the ADOS-G modules. In addition, more research should be
conducted on the Updated Scoring Algorithms identified in this study to determine if they
consistently result in greater levels of specificity and improved balance between sensitivity and
specificity across different samples.
Finally, although the results of the current study provide some evidence to suggest that
the diagnostic accuracy of decisions made with and without ADOS-G scores obtained from the
Original Scoring Algorithm are relatively consistent, the current study is the first to conduct this
comparison. As such, further research in this area is warranted. Use of a larger sample size, to
124
allow for the comparison of the independent diagnostic accuracy of both the Original and
Revised Scoring Algorithms, is recommended. Further, to allow for truly independent
comparisons, it is recommended that two clinicians participate in all aspects of a comprehensive
autism evaluation (with the exception of the ADOS-G administration) and then the clinician who
did not participate in the ADOS-G administration make the final determination regarding clinical
diagnosis. Scores on the ADOS-G can then be compared to end clinical diagnosis made
independent of the ADOS-G.
Conclusions
Overall, the findings of this study add to the current body of evidence regarding the
validity and diagnostic accuracy of scores from the ADOS-G. Exploratory factor analysis
supported the structural validity of the Original and Revised Scoring Algorithms across the three
modules under investigation. However, item deletion was suggested for the majority of modules
and scoring algorithms to increase overall scale reliabilities. Item deletions from the Module 1-
OSA, Module 2-RSA, and Module 3 OSA and RSA resulted in the creation of four Updated
Scoring Algorithms, which were considered in subsequent analyses. Although evidence of
convergent validity was not obtained for ADOS-G scores in this current study, evidence of the
discriminant validity of total scores on the ADOS-G was obtained across modules and scoring
algorithms. A review of the indicators of diagnostic accuracy also indicated that scores from the
ADOS-G consistently demonstrate high levels of sensitivity across modules and scoring
algorithms, but inadequate levels of specificity. Although use of the Updated Scoring Algorithms
consistently improved the balance between sensitivity and specificity across modules,
“improved” levels of specificity were still lower than recommended for diagnostic tests. Low
levels of specificity may be related to the overlap in behavioral symptoms of ASDs and other
125
disorders (i.e., ADHD and Generalized Anxiety Disorder) that students who do not receive a
diagnosis of Autism often receive. Given the low specificity values observed in this study, it is
imperative that the ADOS-G be used as one part of a multimodal evaluation and not be the
singular criteria against which a diagnosis of ASD is made. Based on the accumulated evidence,
use of the Module 1 Updated Original Scoring Algorithm, Module 2 Updated Revised Scoring
Algorithm, and Module 3 Updated Revised Scoring Algorithms is recommended to clinicians.
126
References
Achenbach, T. M., & Rescorla, L. A. (2000). Manual for the ASEBA preschool forms and
profiles. Burlington, VT: University of Vermont, Research Center for Children, Youth,
and Families.
Allen, D. A. (1988). Autistic spectrum disorders: clinical presentation in preschool children.
Journal of Child Neurology, 3 (Suppl.), S48-56.
Allen, R. A., Robins, D. L., & Decker, S. L. (2008). Autism Spectrum Disorders: Neurobiology
and current assessment practices. Psychology in the Schools, 45, 905-917. doi:
10.1002/pits.20341
American Educational Research Association (1999). Standards for educational and
psychological testing. Washington, DC: Author.
American Psychiatric Association (1952). Diagnostic and statistical manual of mental disorders.
Washington, DC: Author.
American Psychiatric Association (1968). Diagnostic and statistical manual of mental disorders
(2nd ed.). Washington, DC: Author.
American Psychiatric Association (1980). Diagnostic and statistical manual of mental disorders
(3rd ed.). Washington, DC: Author.
American Psychiatric Association (1987). Diagnostic and statistical manual of mental disorders
(3rd ed. Revision). Washington, DC: Author.
American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders
(4th ed.). Washington, DC: Author.
American Psychiatric Association (2004). Diagnostic and statistical manual of mental disorders
(4th ed., Text Revision). Washington, DC: Author.
Autism (n.d.). Retrieved October 6, 2010 from http://www.apa.org/topics/autism/index.aspx.
127
Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Psychology
(Statistical Section), 3, 77-85. Retrieved from http://onlinelibrary.wiley.com/
journal/10.1111/(ISSN)2044-8295
Bishop, D. V., & Norbury, C. F. (2002). Exploring the borderlands of autistic disorder and
specific language impairment: a study using standardized diagnostic instruments. Journal
of Child Psychology and Psychiatry & Allied Disciplines, 43, 917-929. doi:
10.1111/1469-7610.00114
Briggs, N. E., & MacCallum, R. C. (2003). Recovery of the weak common factors by maximum
likelihood and ordinary least squares estimation. Multivariate Behavioral Research, 38,
25-56. doi: 10.1207/S15327906MBR3801_2
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral
Research, 1, 245-276. doi: 10.1207/s15327906mbr0102_10
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and
standardized assessment instruments in psychology. Psychological Assessment, 4, 284-
290. doi: 10.1037/1040-3590.6.4.284
Cicchetti, D. V., Lord, C., Koenig, K., Klin, A., & Volkmar, F. R. (2008). Reliability of the ADI-
R: multiple examiners evaluate a single case. Journal of Autism and Developmental
Disorders, 38, 764-770. doi: 10.1007/s10803-007-0448-3
Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four
recommendations for getting the most from your analysis. Practical Assessment,
Research, & Evaluation, 10 (7). Retrieved from http://pareonline.net/pdf/v10n7.pdf.
Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychology, 34, 481-489. doi:
10.1037/0022-0167.34.4.481
128
de Bildt, A., Sytema, S., Ketelaars, C., Kraijer, D., Mulder, E., Volkmar, F., & Minderaa, R.
(2004). Interrelationship between autism diagnostic observation schedule-generic
(ADOS-G), autism diagnostic interview-revised (ADI-R), and the diagnostic and
statistical manual of mental disorders (DSM-IV-TR) classification in children and
adolescents with mental retardation. Journal of Autism and Developmental Disorders, 34,
129-137. doi: 10.1007/s10803-009-0749-9
DiLavore, P., Lord, C., & Rutter, M. (1995). Pre-Linguistic Autism Diagnostic Observation
Schedule (PL-ADOS). Journal of Autism and Developmental Disorders, 25, 355-379.
doi: 10.1007/BF02179373
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use
of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-
299. doi: 10.1037/1082-989X.4.3.272
Filipek, P. A., Accardo, P. J., Baranek, G. T., Cook, E. H., Dawson, G., Gordon, B. et al. (1999).
The screening and diagnosis of Autism Spectrum Disorders. Journal of Autism and
Developmental Disorders, 29, 439-484.
Ghaziuddin, M. (2005). Mental health aspects of autism and Asperger syndrome. Philadelphia,
PA: Jessica Kingsley Publishers.
Gilliam, J. E. (1995). Gilliam Autism Rating Scale. Austin, TX: Pro-Ed.
Gilliam, J. E. (2006). Gilliam Autism Rating Scale (2nd ed.). Austin, TX: Pro-Ed.
Gioia, G. A., Isquith, P. K., Guy, S. C., & Kenworthy, L. (2000). Behavior Rating Inventory of
Executive Function Professional Manual. Lutz, FL: Psychological Assessment
Resources, Inc.
Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of
129
Personality Assessment, 68, 532-560. doi: 10.1207/s15327752jpa6803_5
Gotham, K., Risi, S., Dawson, G., Tager-Flusberg, H., Joseph, R., Carter, A. et al. (2008).
A replication of the Autism Diagnostic Observation Schedule revised algorithms. Journal
of the American Academy of Child and Adolescent Psychiatry, 47, 642-651. doi:
10.1097/CHI.0b013e31816bffb7
Gotham, K., Risi, S., Pickles, A., & Lord, C. (2006). The Autism Diagnostic Observation
Schedule: Revised algorithms for improved diagnostic validity. Journal of Autism and
Developmental Disorders, 37, 613-627. doi: 10.1007/s10803-006-0280-1
Gray, K. M., Tonge, B. J., & Sweeney, D. J. (2008). Using the autism diagnostic interview-
revised and the autism diagnostic observation schedule with young children with
developmental delays: evaluating diagnostic validity. Journal of Autism and
Developmental Disorders, 38, 657-667. doi: 10.1007/s10803-007-0432-y
Horn, J. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika,
30, 179-185. doi: 10.1007/BF02289447
Kaiser, H. F. (1974). An index in factorial simplicity. Psychometrika, 39, 31-36. doi: 10.1007/
BF02291575
Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217-250.
Kline-Tasman, B. P., Risi, S., & Lord, C. E. (2007). Effect of language and task demands on the
diagnostic effectiveness of the Autism Diagnostic Observation Schedule: the impact of
module choice. Journal of Autism and Developmental Disorders, 37, 1224-1234. doi:
10.1007/s10803-006-0266-z
Krug, D. A., Arick, J. R., & Almond, P. J. (1993). The Autism Screening Instrument. Austin,
TX: Pro-Ed.
130
Le Couteur, A., Haden, G., Hammal, D., & McConachie, H. (2008). Diagnosing autism spectrum
disorders on pre-school children using two standardized assessment instruments: the
ADI-R and the ADOS. Journal of Autism and Developmental Disorders, 38, 362-372.
doi: 10.1007/s10803-007-0403-3
Lord, C. E. (2010). Autism: from research to practice. American Psychologist, 65, 815-826. doi:
10.1111/j.1469-7610.1992.tb00887.x
Lord, C. & Risi, S. (1998). Frameworks and methods in diagnosing Autism Spectrum Disorders.
Mental Retardation and Developmental Disabilities Research Reviews, 4, 90-96. doi:
10.1002/(SICI)1098-2779(1998)4:2<90::AID-MRDD5>3.0.CO;2-0
Lord, C., Risi, S., DiLavore, P., Shulman, C., Thurm, A., & Pickles, A. (2006). Autism from two
to nine. Archives of General Psychiatry, 63, 694-701.
Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Leventhal, B. L., DiLavore, P. C., et al (2000).
The Autism Diagnostic Observation Schedule-Generic: A standard measure of social and
communication deficits associated with the spectrum of autism. Journal of Autism and
Developmental Disorders, 30, 205-223.
Lord, C., Rutter, M., DiLavore, P. C., & Risi, S. (1999). Autism Diagnostic Observation
Schedule. Los Angeles: Western Psychological Services.
Lord, C., Rutter, M., Goode, S., Heemsbergen, J., Jordan, H., Mawhood, L. et al. (1989).
Autism Diagnostic Observation Schedule: A standardized observation of communicative
and social behavior. Journal of Autism and Developmental Disorders, 19, 185-212. doi:
10.1007/BF02211841
Lord, C., Rutter, M., & LeCouteur, A. (1994). Autism Diagnostic Interview-Revised: A revised
131
version of a diagnostic interview for caregivers of individuals with possible pervasive
developmental disorders. Journal of Autism and Developmental Disorders, 27, 659-685.
doi: 10.1007/BF02172145
Lord, C. & Volkmar, F. (2002). Genetics of childhood disorders: XLII. Autism, part 1: Diagnosis
and assessment in Autism Spectrum Disorders. Journal of the American Academy of
Child and Adolescent Psychiatry, 41, 1-5. doi: 10.1097/00004583-200209000-00015
Matson, J. L., & Gonzalez, M. L. (2007). Autism Spectrum Disorder-Diagnostic for Children.
Baton Rouge, LA: Disability Consultants, LLC.
Matson, J. L., Gonzalez, M., Wilkins, J., & Rivet, T. T. (2008). Reliability of the Autism
Spectrum Disorder-Diagnostic for Children in spectrum disorders in children: an
overview. Research in Autism Spectrum Disorders, 2, 533-545. doi:
10.1016/j.rasd.2007.11.001
Matson, J. L., Gonzalez, M., & Wilkins, J. (2009). Validity study of the Autism
Spectrum Disorder-Diagnostic for Children (ASD-DC). Research in Autism Spectrum
Disorders, 3, 196-206. doi: 10.1016/j.rasd.2008.05.005
Matthey, S., & Petrovsky, P. (2002). The Children’s Depression Inventory: Error in cutoff scores
for screening purposes. Psychological Assessment, 14, 146-149. doi:
10.1037//1040-3590.14.2.146
Mazefsky, C. A., & Oswald, D. P. (2006). The discriminative ability and diagnostic utility of the
ADOS-G, ADI-R, and GARS for children in a clinical setting. Autism, 10, 533-549. doi:
10.1177/1362361306068505
McClure, I., Mackay, T., Mamdani, H., & McCaughey, R. (2010). A comparison of a specialist
132
autism spectrum disorder assessment team with local assessment teams. Autism, 14, 589-
603. doi: 10.1177/1362361310373369
Molloy, C. A., Murray, D. S., Akers, R., Mitchell, T., & Manning-Courtney, P. (2011). Use of
the Autism Diagnostic Observation Schedule (ADOS) in a clinical setting. Autism,
15, 143-162. doi: 10.1177/1362361310379241
Montgomery, J. M., Newton, B., & Smith, C. (2008). Test review: GARS-2: Gilliam Autism
Rating Scale Second Edition. Journal of Psychoeducational Assessment, 26, 395-401.
doi: 10.1177/0734282908317116
Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005) Minimum sample size recommendations for
conducting factor analysis. International Journal of Testing, 5, 159-168. doi:
10.1207/s15327574ijt0502_4
Noterdaeme, M., Mildenberger, K., Sitter, S., & Amorosa, H. (2002). Parent information and
direct observation in the diagnosis of pervasive and specific developmental disorders.
Autism, 6, 159-168. doi: 10.1177/1362361302006002003
Oosterling, I., Roos, S., de Bildt, A., Rommelse, N., de Jonge, M., Visser, J.,…Buitelaar, J.
(2010). Improved diagnostic validity of the ADOS revised algorithms: a replication study
in an independent sample. Journal of Autism and Developmental Disorders, 40, 689-703.
doi: 10.1007/s10803-009-0915-0
Overton, T., Fielding, C., & de Alba, R. G. (2007). Brief report: Exploratory analysis of
the ADOS revised algorithm: Specificity and predictive value with Hispanic children
referred for autism spectrum disorders. Journal of Autism and Developmental Disorders,
38, 1166-1169. doi: 10.1007/s10803-007-0488-8
Pandolfi, V., Magyar, C. I., & Dill, C. A. (2010). Constructs assessed by the GARS-2: Factor
133
analysis data of the standardization sample. Journal of Autism and Developmental
Disorders, 40, 1118-1130. doi: 10.1007/s10803-010-0967-1
Papanikolaou, K., Paliokosta, E., Houliaras, G., Vgenopoulou, S., Giouroukou, E., Pehlivanidis,
A., …Tsiantis, I. (2009). Using the Autism Diagnostic Interview-Revised and the Autism
Diagnostic Observation Schedule-Generic for the diagnosis of autism spectrum disorders
in a Greek sample with a wide range of intellectual abilities. Journal of Autism and
Developmental Disorders, 39, 414-420. doi: 10.1007/s10803-008-0639-6
Reaven, J. A., Hepburn, S. L., Ross, R. G. (2008). Use of the ADOS and the ADI-R in children
with psychosis: importance of clinical judgment. Clinical Child Psychology and
Psychiatry, 13, 81-94. doi: 10.1177/1359104507086343
Reynolds, C. R. & Kamphaus, R. W. (2004). Behavior Assessment System for Children (2nd ed.).
Circle Pines, MN: AGS Publishing.
Risi, S., Lord, C., Gotham, K., Corsello, C., Chrysler, C., Szatmari, P., …Pickles, A. (2006).
Combining information from multiple sources in the diagnosis of Autism Spectrum
Disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 45,
1094-1103. doi: 10.1097/01.chi.0000227880.42780.0e
Rutter, M., LeCouteur, A., & Lord, C. (2003). Autism Diagnostic Interview-Revised. Los
Angeles, CA: Western Psychological Services.
Salvia, J., & Ysseldyke, J. E. (2004). Assessment in Special and Inclusive Education: Ninth
Edition. Boston: Houghton Mifflin.
Schopler, E., Reichler, R. J., & Rochen Renner, B. R. (1988). The Childhood Autism Rating
Scale. Los Angeles, CA: Western Psychological Services.
Sikora, D. M., Hall, T. A., Hartley, S. L., Gerrard-Morris, A. E., & Cagle, S. (2008). Does parent
134
report of behavior differ across ADOS-G classifications: analysis of scores from the
CBCL and GARS. Journal of Autism and Developmental Disorders, 38, 440-448. doi:
10.1007/s10803-007-0407-z
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling. Boca Raton,
FL: Chapman & Hall/CRC Press.
South, M., Williams, B. J., McMahon, W. M., Owley, T., Filipek, P. A., Shernoff, E.,
…Ozonoff, S. (2002). Utility of the Gilliam Autism Rating Scale in research and clinical
populations. Journal of Autism and Related Disorders, 32, 593-599.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed). Boston: Allyn
and Bacon.
Tataryn, D. J., Wood, J. M., & Gorsuch, R. L. (1999). Setting the value of k in promax: A Monte
Carlo study. Educational and Psychological Measurement, 59, 384-391. doi:
10.1177/00131649921969938
Tomanik, S. S., Pearson, D. A., Loveland, K. A., Lane, D. M., & Shaw, J. B. (2007). Improving
the reliability of autism diagnoses: Examining the utility of adaptive behavior. Journal of
Autism and Developmental Disorders, 37, 921-928. doi: 10.1007/s10803-006-0227-6
Tsai, L. (1992). Diagnostic issues in high-functioning autism. In E. Schopler, & G. Mesibov
(Eds.), High functioning individuals with autism (pp. 11-40). New York: Plenum.
Velicer, W. F. (1976). Determining the number of components from the matrix of partial
correlations. Psychometrika, 41, 321-327. doi: 10.1007/BF02293557
Ventola, P. E., Kleinman, J., Pandey, J., Barton, M., Allen, S., Green, J.,…Fein, D. (2006).
135
Agreement among four diagnostic instruments for autism spectrum disorders in toddlers.
Journal of Autism and Developmental Disorders, 36, 839-847. doi: 10.1007/s10803-006-
0128-8
Volkmar, F. R., Lord, C., Bailey, A., Schultz, R. T., & Klin, A. (2004). Autism and pervasive
developmental disorders. Journal of Child Psychology and Psychiatry, 45, 135-170. doi:
10.1046/j.0021-9630-2003.00317.x
Wegener, D. T., & Fabrigar, L. R. (2000). Analysis and design for nonexperimental data. In H.
T. Reis & C. M. Judd (Eds.) Handbook of research methods in social and personality
psychology (pp. 412-450). New York: Cambridge University Press.
Widaman, K. F. (1993). Common factor analysis versus principal component analysis:
Differential bias in representing model parameters? Multivariate Behavioral Research,
28, 263-311. doi: 10.1207/s15327906mbr2803_1
Wood, J. M., Tataryn, D. J., & Gorsuch, R. L. (1996). Effects of under- and overextraction on
principal factor analysis with Varimax rotation. Psychological Methods, 1, 354-365. doi:
10.1037/1082-989X.1.4.354
136
Footnotes
1Due to an insufficient sample size (N = 16), the Module 1, No Words Revised Scoring
Algorithm was excluded from analyses of factor structure, correlational analyses, and analysis
of diagnostic accuracy. Subsequent references to the Module 1 Revised Scoring Algorithm
(Module 1-RSA) refer to the Module 1, Some Words Revised Scoring Algorithm only.
2Due to insufficient sample size (N = 45), the Module 2, Less Than 5 Years Revised
Scoring Algorithm was excluded from analyses of factor structure, correlational analyses, and
analysis of diagnostic accuracy. Subsequent references to the Module 2 Revised Scoring
Algorithm (Module 2-RSA) refer to the Module 2, Greater Than or Equal to 5 Years Revised
Scoring Algorithm only.
137
Appendix A
Diagnostic Criteria for Autism Spectrum Disorders as defined by the Diagnostic and Statistical
Manual for Mental Disorders, Fourth Edition-Text Revision (DSM-IV-TR; American Psychiatric
Association, 2000)
Diagnostic Criteria for Autistic Disorder
A total of six or more items from 1, 2, and 3, with at least two from 1, and one each from 2
and 3:
1. Qualitative impairment in social interaction, as manifested by at least two of the following:
a. Marked impairment in the use of multiple nonverbal behaviors such as eye-to-eye gaze,
facial expression, body postures, and gestures to regulate social interaction
b. Failure to develop peer relationships appropriate to developmental level
c. A lack of spontaneous seeking to share enjoyment, interests, or achievements with other
people
d. A lack of social or emotional reciprocity
2. Qualitative impairments in communication as manifested by one of the following:
a. Delay in, or total lack of, the development of spoken language (not accompanied by
attempts to communicate nonverbally)
b. In individuals with adequate speech, marked impairment in the ability to initiate or
sustain a conversation with others
c. Stereotyped and repetitive use of language or idiosyncratic language
d. Lack of varied, make-believe play or social imaginative play appropriate to
developmental level
138
3. Restricted repetitive and stereotyped patterns of behavior, interests, and activities, as
manifested by at least one of the following:
a. Encompassing preoccupation with one or more stereotyped and restricted patterns of
interest that is abnormal in intensity or focus
b. Apparently inflexible adherence to specific nonfunctional routines or rituals
c. Stereotyped and repetitive motor mannerisms
d. Persistent preoccupation with parts or objects
Delays or abnormal function in at least one of the following categories must be present prior to
age 3 years: social interaction, communicate language, and/or symbolic or imaginative play.
Diagnostic Criteria for Asperger’s Disorder
1. Qualitative impairment in social interaction, as manifested by at least two of the following:
a. Marked impairment in the use of multiple nonverbal behaviors such as eye-to-eye gaze,
facial expression, body postures, and gestures to regulate social interaction
b. Failure to develop peer relationships appropriate to developmental level
c. A lack of spontaneous seeking to share enjoyment, interests, or achievements with other
people
d. A lack of social or emotional reciprocity
2. Restricted repetitive and stereotyped patterns of behavior, interests, and activities, as
manifested by at least one of the following:
a. Encompassing preoccupation with one or more stereotyped and restricted patterns of
interest that is abnormal in intensity or focus
b. Apparently inflexible adherence to specific nonfunctional routines or rituals
c. Stereotyped and repetitive motor mannerisms
139
d. Persistent preoccupation with parts or objects
3. The disturbance causes clinically significant impairment in social, occupational, or other
important areas of functioning.
4. There is no clinically significant general delay in language (e.g., single words used by age 2
years, communication phrases used by age 3 years).
5. There is no clinically significant delay in cognitive development or in the development of
age-appropriate self-help skills, adaptive behavior (other than social interaction), and
curiosity about the environment in childhood.
6. Criteria are not met for another specific Pervasive Developmental Disorder or Schizophrenia.
Diagnostic Criteria for Rhett’s Disorder
1. All of the following are observed:
a. Apparently normal prenatal and perinatal development
b. Apparently normal psychomotor development through the first 5 months after birth
c. Normal head circumference at birth
2. Onset of all of the following after the period of normal development:
a. Decelerations of head growth between ages 5 and 48 months
b. Loss of previously acquired purposeful hand skills between the ages 5 and 30 months
with the subsequent development of stereotyped hand movements
c. Loss of social engagement early in the course (although often social interaction develops
later)
d. Appearance of poorly coordinated gait or trunk movements
e. Severely impaired expressive and receptive language development with severe
psychomotor retardation
140
Diagnostic Criteria for Childhood Disintegrative Disorder
1. Apparently normal development for at least the first 2 years after birth as manifested by the
presence of age-appropriate verbal and nonverbal communication, social relationships, play,
and adaptive behavior.
2. Clinically significant loss of previously acquired skills (before age 10 years) in at least two of
the following areas:
a. Expressive or receptive language
b. Social skills or adaptive behavior
c. Bowel or bladder control
d. Play
e. Motor skills
3. Abnormalities of functioning in at least two of the following areas:
a. Qualitative impairment in social interaction
b. Qualitative impairments in communication
c. Restricted, repetitive, and stereotyped patterns of behavior, interests, and activities,
including motor stereotypies and mannerisms
4. The disturbance is not better accounted for by another Pervasive Developmental Disorder or
by Schizophrenia.
Diagnostic Criteria for Pervasive Developmental Disorder Not Otherwise Specified
1. This category should be used when there is a severe and pervasive impairment in the
development of reciprocal social interaction associated with impairment in either verbal or
nonverbal communication skills or with the presence of stereotyped behavior, interests, and
activities, but the criteria are not met for a specific Pervasive Developmental Disorder,
141
Schizophrenia, Schizotypal Personality Disorder, or Avoidant Personality Disorder. For
example, this category includes “atypical autism”-presentations that do not meet the criteria
for Autistic Disorder because of late age of onset, atypical symptomatology, or subthreshold
symptomatology, or all of these.
142
Appendix B
Table B1
Activities on the Autism Diagnostic Observation Schedule and their Purpose by Module (Lord, Rutter, DiLavore, & Risi, 1999)
Module 1 Module 2 Module 3 Module 4
Free Play – used as a warm-up period in which the child can adjust to the testing environment and examiners, and to assess the child’s independent use of toys, engagement with parent/caregiver, and determine the presence or absence of repetitive behaviors
Construction Task – used as a warm-up activity, an opportunity to observe the child’s interactive behavior during a structured task, and allows for the observation of whether and how the child asks for help within the context of a structured task
Construction Task – used as a warm-up activity, an opportunity to observe the participant’s interactive behavior during a structured task, and allows for the observation of whether and how the participant asks for help within the context of a structured task
Construction Task – used as a warm-up activity, an opportunity to observe the participant’s interactive behavior during a structured task, and allows for the observation of whether and how the participant asks for help within the context of a structured task *optional activity for Module 4
Response to Name – used to assess the child’s response to his/her name when it is purposefully called to gain his/her attention
Response to Name – used to assess the child’s response to his/her name when it is purposefully called to gain his/her attention
Make-Believe Play – used to observe the participant’s creative or imaginative use of miniature objects in an unstructured task
Telling a Story From a Book – used to assess the participant’s ability to follow and comment on a sequential story in a picture book and to generate spoken language
(table continues)
143
Table B1 (continued)
Module 1 Module 2 Module 3 Module 4
Response to Joint Attention – used to assess the child’s response to the examiner’s use of eye contact coordinated with facial orientation, verbalization, and pointing, in order to draw his/her attention from a distant object
Make-Believe Play – used to observe the child’s creative or imaginative use of miniature objects in an unstructured task
Joint Interactive Play – used to assess the degree and quality of the participant’s coordination of behavior and affect with the examiner in joint interactive play
Description of a Picture – used to generate a sample of language and/or other communicative behaviors *optional activity for Module 4
Bubble Play – used to elicit eye contact and vocalization from the child in coordination with his/her pointing or reaching in order to direct the attention of their parent/caregiver or the examiner to a distant object
Joint Interactive Play – used to assess the degree and quality of the child’s coordination of behavior and affect with the examiner in joint interactive play
Demonstration Task – used to assess the participant’s ability to communicate about a familiar series of actions using gesture or mime with accompanying language, and to report on a familiar event
Conversation and Reporting – used to assess the participant’s ability to engage in a conversation with to-and-fro interchange, to describe an event or situation for which there are no visual cues, to gain a language sample in less structured circumstances than the picture task, and to evaluate the participant’s ability to recount a nonroutine event
(table continues)
144
Table B1 (continued)
Module 1 Module 2 Module 3 Module 4
Anticipation of a Routine with Objects – used to assess the child’s anticipation and initiation of the repetition of a an action routine with objects
Conversation – used to assess the child’s ability to carry out a minimal conversation with back-and-forth interchange, and to generate a language sample in less structured circumstances than the other tasks
Description of a Picture – used to generate a sample of language and/or other communicative behaviors
Current Work or School – used to evaluate how the participant describes his/her current situation, and whether he/she understands his/her role in determining what will happen in the future *optional activity for Module 4
Responsive Social Smile – used to assess the child’s smiling in response to a purely social overture from an adult
Response to Joint Attention – used to assess the child’s response to the examiner’s use of eye contact coordinated with facial orientation, verbalization, and pointing, in order to draw his/her attention to a distant object
Telling a Story From a Book – used to assess the participant’s ability to follow and comment on a sequential story in a picture book and to generate spoken language
Social Difficulties and Annoyance – used to assess the participant’s insight into personal social difficulties and sense of responsibility for his/her own actions
(table continues)
145
Table B1 (continued)
Module 1 Module 2 Module 3 Module 4
Anticipation of a Social Routine – used to assess the child’s anticipation of, request for, and participation in a social routine
Demonstration Task – used to assess the child’s ability to communicate about a familiar series of actions using gesture or mime with accompanying language, and to report on a familiar event
Cartoons – used to observe the way in which the participant narrates a story, uses gestures to enact events, and integrates gesture with gaze and language
Emotions – used to probe the participant’s understanding of emotions, the contexts in which they arise, and his/her individual experience of emotions
Functional and Symbolic Imitation – used to observe the child’s imitation of simple actions with real objects and with nonmeaningful placeholders for the same objects
Description of a Picture – used to generate a sample of language and/or other communicative behaviors
Conversation and Reporting – used to assess the participant’s ability to engage in a conversation with to-and-fro interchange, to describe an event or situation for which there are no visual cues, to gain a language sample in less structured circumstances than the picture task, and to evaluate the participant’s ability to recount a nonroutine event
Demonstration Task – used to assess the participant’s ability to communicate about a familiar series of actions using gesture or mime with accompanying language, and to report on a familiar event
(table continues)
146
Table B1 (continued)
Module 1 Module 2 Module 3 Module 4
Birthday Party – used to create an opportunity for the child to engage in functional and symbolic play
Telling a Story From a Book – used to assess the child’s ability to follow and comment on a sequential story in a picture book and to generate spoken language
Emotions – used to probe the participant’s understanding of emotions, the contexts in which they arise, and his/her individual experience of emotions
Cartoons – used to observe the way in which the participant narrates a story, uses gestures to enact events, and integrates gesture with gaze and language *optional activity for Module 4
Snack – used to give the child an opportunity to make requests in a familiar context
Free Play – used to create a relaxed situation with no demands or intrusions, in which the child can have a break from the demands of the evaluation, and to assess the child’s independent use of toys and his/her engagement with an adult during free play in a new environment
Social Difficulties and Annoyance – used to assess the participant’s insight into personal social difficulties and sense of responsibility for his/her own actions
Break – used to give the participant a break from the social demands of the assessment and to provide an opportunity to observe his/her behavior in less structured circumstances
(table continues)
147
Table B1 (continued)
Module 1 Module 2 Module 3 Module 4
Birthday Party – used to create an opportunity for the child to engage in functional and symbolic play
Break – used to give the participant a break from the social demands of the assessment and to provide an opportunity to observe his/her behavior in less structured circumstances
Daily Living – used to obtain factual information and background for the socioemotional questions, and to evaluate the participant’s understanding and views regarding money, residential arrangements, and leisure activities
Snack – used to give the child an opportunity to make requests in a familiar context
Friends and Marriage – used to obtain a detailed description of one or more relationships that the participant would describe as friendship, and also to obtain a general description of his/her understanding of the concept of friendship and the idea of establishing a family or building a long-term relationship as a couple
Friends and Marriage – used to obtain a detailed description of one or more relationships that the participant would describe as friendship, and also to obtain a general description of his/her understanding of the concept of friendship and the idea of establishing a family or building a long-term relationship as a couple
(table continues)
148
Table B1 (continued)
Module 1 Module 2 Module 3 Module 4
Anticipation of a Routine with Objects – used to assess the child’s anticipation and initiation of the repetition of a an action routine with objects
Loneliness – used to provide another opportunity to assess the participant’s insight into his/her social situation, and ability to describe his/her emotional reaction to it
Loneliness – used to provide another opportunity to assess the participant’s insight into his/her social situation, and ability to describe his/her emotional reaction to it
Bubble Play – used to elicit eye contact and vocalization from the child in coordination with his/her pointing or reaching in order to direct the attention of their parent/caregiver or the examiner to a distant object
Creating a Story – used to observe creativity in a play-like situation that is appropriate for older children, adolescents, and adults
Plans and Hopes – used to give the participant an opportunity to describe any goals or aspirations that he/she may have
Creating a Story – used to observe creativity in a play-like situation that is appropriate for older children, adolescents, and adults
149
Table B2
Items Rated on the Autism Diagnostic Observation Schedule by Subdomain and Module (Lord, Rutter, DiLavore, & Risi, 1999)
Module
Subscale Module 1 Module 2 Module 3 Module 4
Language/Communication Overall level of non-echoed language
Overall level of non-echoed language
Overall level of non-echoed language
Overall level of non-echoed language
Frequency of vocalizations directed to othersa
Amount of social overtures/maintenance of attentiona
Speech abnormalities associated with autism (intonation/volume/rate)
Speech abnormalities associated with autism (intonation/volume/rate)
Intonation of vocalizations or verbalizations
Speech abnormalities associated with autism (intonation/volume/rate)
Stereotyped/idiosyncratic use of wordsa
Stereotyped/idiosyncratic use of wordsa
Immediate echolalia Immediate echolalia Immediate echolalia Immediate echolalia Stereotyped/idiosyncratic
use of wordsa Stereotyped/idiosyncratic use of wordsa
Offers information Offers information
Use of other’s body to communicatea
Conversationa Asks for information Asks for information
Pointinga Pointinga Reporting of eventsa Reporting of events
(table continues)
150
Table B2 (continued)
Module
Subscale Module 1 Module 2 Module 3 Module 4
Language/Communication Gesturesa Descriptive, conventional, instrumental, or informational gesturesa
Conversationa Conversationa
Descriptive, conventional, instrumental, or informational gesturesa
Descriptive, conventional, instrumental, or informational gesturesa
Reciprocal Social Interaction
Shared enjoyment in interactiona
Showing Empathy/comments on others’ emotions
Communication of own affect
Responsive social smile Facial expressions directed to othersa
Facial expressions directed to othersa
Facial expressions directed to othersa
Facial expressions
directed to othersa Shared enjoyment in interaction
Language linked to nonverbal communication
Language linked to nonverbal communication
Requesting Response to joint attention Quality of social overturea Insight
Integration of gaze and other behaviors during social overtures
Response to name Shared enjoyment in interaction
Shared enjoyment in interaction
Unusual eye contacta Unusual eye contacta Unusual eye contacta Unusual eye contacta
(table continues)
151
Table B2 (continued)
Module
Subscale Module 1 Module 2 Module 3 Module 4
Reciprocal Social Interaction
Response to name Spontaneous initiation of joint attentiona
Insighta Empathy/comments on others’ emotionsa
Giving Quality of social overturea Quality of social responsea Responsibilitya
Showinga Quality of social responsea Amount of reciprocal
social communication
Quality of social overturea
Spontaneous initiation of joint attentiona
Amount of reciprocal social communicationa
Overall quality of rapporta Quality of social response
Response to joint attentiona
Overall quality of rapporta Amount of reciprocal social communicationa
Quality of social
overturesa
Overall quality of rapport
Play/Imagination Functional play with objects
Functional play with objects
Imagination/creativity Imagination/creativity
Imagination/creativity Imagination/creativity
(table continues)
152
Table B2 (continued)
Module
Subscale Module 1 Module 2 Module 3 Module 4
Stereotyped Behaviors and Restricted Interests
Unusual sensory interest in play material/person
Unusual sensory interest in play material/person
Unusual sensory interest in play material/person
Unusual sensory interest in play material/person
Hand/finger and other complex mannerisms
Hand/finger and other complex mannerisms
Hand/finger and other complex mannerisms
Hand/finger and other complex mannerisms
Self-injurious behavior Self-injurious behavior Self-injurious behavior Self-injurious behavior
Unusually repetitive interests or stereotyped behaviors
Unusually repetitive interests or stereotyped behaviors
Excessive interest in/reference to unusual/highly specified topics, objects or repetitive behaviors
Excessive interest in/reference to unusual/highly specified topics, objects or repetitive behaviors
Compulsions or rituals Compulsions or rituals
Other Abnormal Behavior Overactivity Overactivity Overactivity Overactivity
Tantrums, aggression, or disruptive behavior
Tantrums, aggression, or disruptive behavior
Tantrums, aggression, or disruptive behavior
Tantrums, aggression, or disruptive behavior
Anxiety Anxiety Anxiety Anxiety
Note. a = Item included in the original ADOS-G scoring algorithm.
153
Table B3
Items Included in the Revised Scoring Algorithm on the Autism Diagnostic Observation Schedule-Generic by Developmental Cell
Module
Factor
Module 1 No Words Module 1 Some Words
Module 2 Younger than 5
Module 2 Greater than or Equal To 5
Module 3
Social Affect Unusual eye contact Unusual eye contact Unusual eye contact Unusual eye contact Unusual eye contact
Integration of gaze and other behaviors during social overtures
Integration of gaze and other behaviors during social overtures
Amount of reciprocal social communication
Amount of reciprocal social communication
Amount of reciprocal social communication
Facial expressions directed to others
Facial expressions directed to others
Facial expressions directed to others
Facial expressions directed to others
Facial expressions directed to others
Frequency of vocalizations directed to others
Frequency of vocalizations directed to others
Overall quality of rapport
Overall quality of rapport
Overall quality of rapport
Shared enjoyment in interaction
Shared enjoyment in interaction
Shared enjoyment in interaction
Shared enjoyment in interaction
Shared enjoyment in interaction
Quality of social overtures
Quality of social overtures
Quality of social overtures
Quality of social overtures
Quality of social overtures
(table continues)
154
Table B3 (continued)
Module
Factor Module 1 No Words Module 1 Some Words
Module 2 Younger than 5
Module 2 Greater than or Equal To 5
Module 3
Social Affect Gestures Gestures Descriptive, conventional, or informational gestures
Descriptive, conventional, or informational gestures
Descriptive, conventional, or informational gestures
Showing Showing Showing Showing Quality of social response
Initiation of joint attention
Spontaneous initiation of joint attention
Spontaneous initiation of joint attention
Spontaneous initiation of joint attention
Reporting of events
Response to joint attention
Pointing Pointing Pointing Pointing
Restricted Repetitive Behaviors
Unusual sensory interest in play material/person
Unusual sensory interest in play material/person
Unusual sensory interest in play material/person
Unusual sensory interest in play material/person
Unusual sensory interest in play material/person
Intonation of vocalizations or verbalizations
Stereotyped use of words or phrases
Stereotyped use of words or phrases
Stereotyped use of words or phrases
Stereotyped use of words or phrases
(table continues)
155
Table B3 (continued)
Module
Factor
Module 1 No Words Module 1 Some Words
Module 2 Younger than 5
Module 2 Greater than or Equal To 5
Module 3
Restricted Repetitive Behaviors
Unusually repetitive interests or stereotyped behaviors
Unusually repetitive interests or stereotyped behaviors
Unusually repetitive interests or stereotyped behaviors
Unusually repetitive interests or stereotyped behaviors
Excessive interest in/reference to unusual/highly specified topics, objects or repetitive behaviors
Note. Revised algorithms obtained from Gotham, Risi, Pickles, & Lord (2007).
156
Appendix C
Table C1 Correlation Matrix of Items Included in the ADOS-G Module 1, Original Scoring Algorithm (N = 82) Item A2 A5 A6 A7 A8 B1 B3 B5 B9 B10 B11 B12
A-2 -
A-5 .06 -
A-6 .36 -.01 -
A-7 .64 .08 .26 -
A-8 .52 .16 .25 .65 -
B-1 .69 .15 .26 .52 .38 -
B-3 .72 .12 .22 .57 .46 .67 -
B-5 .61 .05 .25 .54 .63 .51 .62 -
B-9 .74 .01 .28 .54 .45 .69 .63 .49 -
B-10 .66 -.01 .24 .63 .49 .43 .71 .58 .54 -
B-11 .54 -.05 .28 .47 .52 .44 .47 .52 .42 .49 -
B-12 .81 .09 .30 .58 .44 .63 .71 .52 .78 .63 .46 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
157
Table C2
Correlation Matrix of Items Included in the ADOS-G Module 1, Revised Scoring Algorithm (N = 66) Item A2 A5 A7 A8 B1 B3 B4 B5 B9 B10 B12 D1 D2 D4
A-2 -
A-5 .18 -
A-7 .59 .22 -
A-8 .52 .26 .67 -
B-1 .67 .27 .47 .37 -
B-3 .71 .24 .54 .45 .66 -
B-4 .62 .14 .57 .41 .68 .71 -
B-5 .56 .19 .48 .65 .49 .59 .47 -
B-9 .72 .12 .49 .44 .66 .62 .61 .45 -
B-10 .62 .09 .60 .49 .41 .69 .55 .51 .53 -
B-12 .80 .21 .54 .44 .60 .71 .56 .50 .76 .63 -
D-1 .35 .29 .32 .42 .41 .40 .25 .38 .36 .34 .32 -
D-2 .19 .16 .06 .27 .31 .26 .10 .23 .23 .17 .17 .48 -
D-4 .34 .37 .37 .35 .43 .43 .24 .36 .32 .31 .39 .48 .39 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
158
Table C3 Correlation Matrix of Items Included in the ADOS-G Module 2, Original Scoring Algorithm (N = 118) Item A2 A5 A6 A7 A8 B1 B2 B6 B8 B9 B10 B11
A-2 -
A-5 .52 -
A-6 .72 .61 -
A-7 .62 .57 .54 -
A-8 .69 .64 .65 .61 -
B-1 .56 .56 .60 .42 .59 -
B-2 .68 .45 .60 .56 .66 .62 -
B-6 .61 .37 .50 .59 .43 .30 .56 -
B-8 .81 .56 .73 .58 .67 .66 .70 .66 -
B-9 .71 .58 .69 .59 .69 .65 .72 .52 .76 -
B-10 .74 .60 .86 .55 .70 .63 .65 .57 .81 .78 -
B-11 .69 .58 .65 .62 .73 .56 .70 .54 .71 .79 .74 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
159
Table C4
Correlation Matrix of Items Included in the ADOS-G Module 2, Revised Scoring Algorithm (N = 73) Item A5 A7 A8 B1 B2 B3 B5 B6 B8 B10 B11 D1 D2 D4
A-5 -
A-7 .59 -
A-8 .60 .52 -
B-1 .56 .42 .58 -
B-2 .43 .55 .59 .59 -
B-3 .42 .55 .65 .40 .69 -
B-5 .54 .59 .60 .49 .57 .63 -
B-6 .35 .62 .39 .30 .58 .52 .72 -
B-8 .53 .56 .63 .64 .66 .64 .73 .66 -
B-10 .60 .56 .65 .62 .60 .60 .71 .62 .85 -
B-11 .53 .60 .73 .52 .67 .77 .66 .53 .69 .75 -
D-1 .25 .20 .38 .23 .43 .50 .30 .27 .30 .28 .35 -
D-2 .19 .09 .16 .21 .19 .06 -.03 .01 .18 .24 .05 .32 -
D-4 .56 .51 .62 .36 .54 .49 .46 .39 .43 .43 .52 .50 .26 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
160
Table C5 Correlation Matrix of Items Included in the ADOS-G Module 3, Original Scoring Algorithm (N = 261) Item A4 A7 A8 A9 B1 B2 B6 B7 B8 B9 B10
A-4 -
A-7 .16 -
A-8 .22 .53 -
A-9 .09 .52 .49 -
B-1 .18 .26 .42 .38 -
B-2 .29 .43 .58 .52 .53 -
B-6 .22 .48 .52 .36 .33 .44 -
B-7 .27 .48 .62 .52 .49 .62 .55 -
B-8 .36 .48 .62 .50 .45 .61 .59 .72 -
B-9 .21 .49 .74 .53 .44 .58 .48 .68 .64 -
B-10 .17 .47 .60 .53 .41 .58 .52 .64 .61 .62 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
161
Table C6
Correlation Matrix of Items Included in the ADOS-G Module 3, Revised Scoring Algorithm (N = 262) Item A4 A7 A8 A9 B1 B2 B4 B7 B8 B9 B10 D1 D2 D4
A-4 -
A-7 .16 -
A-8 .22 .53 -
A-9 .09 .52 .49 -
B-1 .18 .27 .42 .38 -
B-2 .29 .43 .58 .52 .53 -
B-4 .26 .50 .53 .56 .36 .65 -
B-7 .27 .48 .62 .52 .49 .62 .57 -
B-8 .36 .48 .62 .50 .45 .61 .57 .72 -
B-9 .21 .49 .74 .53 .44 .58 .59 .68 .64 -
B-10 .17 .47 .60 .53 .41 .58 .61 .64 .61 .62 -
D-1 .10 .17 .14 .10 .11 .21 .27 .21 .18 .18 .16 -
D-2 .05 .06 .11 .15 .12 .03 .09 .15 .15 .14 .07 .07 -
D-4 .44 .24 .30 .22 .17 .31 .32 .32 .37 .28 .23 .07 -.02 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
162
Appendix D
Table D1 Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOS-G Original Scoring Algorithm One-Factor Solutions
Module 1 (N = 82) Module 2 (N = 118) Module 3 (N = 262)
Item
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
A-2 .835 .879 .824 .946 N/A N/A
A-4 N/A N/A N/A N/A .283 .912
A-5 .081 .917 .673 .951 N/A N/A
A-6 .335 .905 .806 .947 N/A N/A
A-7 .719 .884 .692 .950 .588 .897
A-8 .651 .888 .791 .947 .750 .888
A-9 N/A N/A N/A N/A .613 .896
B-1 .695 .885 .684 .951 .525 .903
B-2 N/A N/A .773 .948 .726 .890
(table continues)
163
Table D1 (continued)
Module 1 (N = 82) Module 2 (N = 118) Module 3 (N = 262)
Item
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
B-3 .774 .882 N/A N/A N/A N/A
B-5 .698 .885 N/A N/A N/A N/A
B-6 N/A N/A .619 .952 .617 .895
B-7 N/A N/A N/A N/A .788 .887
B-8 N/A N/A .865 .945 .780 .887
B-9 .719 .886 .842 .946 .757 .887
B-10 .701 .886 .862 .945 .716 .889
B-11 .589 .891 .822 .946 N/A N/A
B-12 .775 .884 N/A N/A N/A N/A
Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
164
Table D2 Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOS-G Revised Scoring Algorithm One-Factor Solutions
Module 1 (N = 66) Module 2 (N = 73) Module 3 (N = 262)
Item
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
A-2 .775 .897 N/A N/A N/A N/A
A-4 N/A N/A N/A N/A .324 .893
A-5 .299 .916 .661 .925 N/A N/A
A-7 .662 .901 ,686 .925 .578 .883
A-8 .647 .902 .767 .922 .732 .876
A-9 N/A N/A N/A N/A .622 .881
B-1 .720 .899 .634 .927 .513 .888
B-2 N/A N/A .764 .922 .734 .876
B-3 .794 .896 .742 .923 N/A N/A
(table continues)
165
Table D2 (continued)
Module 1 (N = 66) Module 2 (N = 73) Module 3 (N = 262)
Item
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
B-4 .663 .902 N/A N/A .718 .876
B-5 .660 .901 .758 .923 N/A N/A
B-6 N/A N/A .636 .926 N/A N/A
B-7 N/A N/A N/A N/A .778 .875
B-8 N/A N/A .819 .920 .772 .875
B-9 .704 .900 N/A N/A .753 .874
B-10 .661 .901 .820 .920 .701 .877
B-11 N/A N/A .801 .921 N/A N/A
B-12 .743 .899 N/A N/A N/A N/A
D-1 .538 .906 .441 .931 .225 .895
D-2 .330 .914 .192 .937 .128 .898
D-4 .536 .906 .640 .926 .385 .892
Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic.
166
Table D3 Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOS-G Revised Scoring Algorithm Two-Factor Solutions
Module 1 (N = 66) Module 2 (N = 73)
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Item Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2
A-2 .826 .914 N/A N/A N/A N/A
A-5 .339 .710 .646 .937
A-7 .697 .921 .696 .935
A-8 .623 .925 .750 .518 .933 .625
B-1 .698 .922 .639 .938
B-2 N/A N/A N/A N/A .747 .933
B-3 .804 .915 .736 .933
B-4 .732 .920 N/A N/A N/A N/A
B-5 .662 .923 .789 .931
(table continues)
167
Table D3 (continued) Module 1 (N = 66) Module 2 (N = 73)
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Corrected Item-Total Correlations
Cronbach’s α if Item Deleted
Item Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2
B-6 N/A N/A N/A N/A .656 .936
B-8 N/A N/A N/A N/A .841 .929
B-9 .741 .919 N/A N/A N/A N/A
B-10 .703 .920 .836 .929
B-11 N/A N/A N/A N/A .818 .930
B-12 .778 .918 N/A N/A N/A N/A
D-1 .563 .563 .526 .622
D-2 .438 .640 .290 .744
D-4 .561 .562 .656 .525
Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic
168
Appendix E Table E1 Structure Coefficients and Communalities for the ADOS-G Module 1(Original Scoring Algorithm) Items with Deletion of Item A-5 (N = 82) Item Structure Coefficient Communality
A-2: Frequency of vocalizations to others .892 .796
A-6: Use of other’s body to communicate .355 .126
A-7: Pointing .749 .560
A-8: Gestures .652 .425
B-1: Unusual eye contact .730 .533
B-3: Facial expressions directed to others .822 .676
B-5: Shared enjoyment in interaction .727 .529
B-9: Showing .781 .610
B-10: Spontaneous initiation of joint attention .756 .572
B-11: Response to joint attention .623 .388
B-12: Quality of social overtures .830 .689
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.
169
Table E2 Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items with Deletion of Item D-2 (N = 73) Item Structure Coefficient Communality
A-5: Stereotyped use of words .672 .451
A-7: Pointing .716 .512
A-8: Gestures .792 .627
B-1: Unusual eye contact .650 .422
B-2: Facial expressions directed to others .783 .614
B-3: Shared enjoyment in interactions .782 .612
B-5: Showing .807 .652
B-6: Spontaneous initiation of joint attention .682 .465
B-8: Quality of social overtures .850 .723
B-10: Amount of reciprocal social communication .847 .717
B-11: Overall quality of rapport .848 .719
D-1: Unusual sensory interests in person/play materials .436 .190
D-4:Repetitive interests/stereotyped behaviors .640 .410
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 2, Less Than 5 Years Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only.
170
Table E3
Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items with Deletion of Item A-4 (N = 262) Item Structure Coefficient Communality
A-7: Reporting of events .623 .389
A-8: Conversation .796 .634
A-9: Gestures .654 .427
B-1: Unusual eye contact .555 .308
B-2: Facial expressions directed to others .748 .560
B-6: Shared enjoyment in interactions .649 .422
B-7: Quality of social overtures .831 .691
B-8: Quality of social response .811 .658
B-9: Amount of reciprocal social communication .808 .653
B-10: Overall quality of rapport .768 .589
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.
171
Table E4 Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items with Deletion of Items D-1 and D-2 (N = 261) Item Structure Coefficient Communality
A-4: Stereotyped use of words/phrases .327 .107
A-7: Reporting of events .617 .380
A-8: Conversation .781 .610
A-9: Gestures .663 .440
B-1: Unusual eye contact .549 .301
B-2: Facial expressions directed to others .775 .600
B-4: Shared enjoyment in interaction .751 .564
B-7: Quality of social overtures .822 .675
B-8: Quality of social response .808 .653
B-9: Amount of reciprocal social communication .806 .650
B-10: Overall quality of rapport .759 .577
D-4: Excessive interest in specific topics/repetitive behav .395 .156
Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.
172
Appendix F
Table F1
Cut Scores Used for ADOS-G Classification Determinations by Module and Scoring Algorithm Cut-Scores
Original/Revised Scoring Algorithm Module
Original Scoring Algorithm (C + SI Total Score)
Revised Scoring Algorithm (SA + RRB Total Score)
Module 1
Non-Autism ASDa 7 8
Autistic Disorder 12 12
Module 2
Non-Autism ASDa 8 8
Autistic Disorder 12 9
Module 3
Non-Autism ASDa 7 7
Autistic Disorder 10 9
Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic; C + SI = Communications + Social Interaction Total Score; SA + RRB = Social Affect + Restricted Repetitive Behavior Total Score. aNon-Autism ASD = PDD NOS and Asperger’s Disorder.
173
Table F2 Sensitivity and Specificity Values of Scores on the Original Scoring Algorithm from the Current Sample and Lord et al.’s (1999) Original Sample Sensitivity Specificity
Current
Sample Original Sample
Current Sample
Original Sample
Non-Autism ASD vs. No Spectrum Disorder (N = 294)
Module 1 1.00 .94 .75 .94
Module 2 .83 .89 .62 .88
Module 3 .95 .80 .44 .94
Autistic Disorder vs. No Spectrum Disorder (N = 233)
Module 1 .90 1.00 .75 1.00
Module 2 .90 .95 .79 .94
Module 3 .94 .90 .66 1.00
Autistic Disorder + ASD vs. No Spectrum Disorder (N = 400)
Module 1 1.00 .97 .52 .94
Module 2 .90 .95 .67 .87
Module 3 .96 .90 .44 .94
174
Table F3 Sensitivity and Specificity Values of Scores on the Revised Scoring Algorithm from the Current Sample and Previous Studies Sensitivity
Specificity
Current Sample
Gotham et al.
(2007)
Gotham et al.
(2008)
Molloy et al.
(2011)
Current Sample
Gotham et al.
(2007)
Gotham et al.
(2008)
Molloy et al.
(2011)
Non-Autism ASD vs. NSD (N = 294)
Module 1a 1.00 .77 .95 1.0 .75 .82 .75 .46
Module 2b .83 .83 NA .85 .70 .83 NA .60
Module 3 .90 .72 .60 .87 .49 .76 .88 .35
Autistic Disorder vs. NSD (N = 234)
Module 1a .97 .97 .89 .93 .75 .91 .91 .69
Module 2b .95 .98 NA .94 .76 .90 NA .65
Module 3 .94 .91 .82 .92 .65 .84 .92 .55
Note. NSD = No Spectrum Disorder. Studies presented for direct comparison are those with sample sizes consistent with the current sample (N > 300) and conducted with American children and adolescents. aDescribes scores from the Module 1, Some Words Revised Scoring Algorithm only. bDescribes scores from the Module 2, > 5 Years Revised Scoring Algorithm only
175
Appendix G
Curriculum Vitae
Melissa A. Reid, M.Ed. [email protected]; [email protected]
203-530-9567
EDUCATION The Pennsylvania State University, University Park, PA Doctor of Philosophy, School Psychology August 2012 (Anticipated) GPA: 3.91/4.0 Masters of Education, School Psychology December 2006 Southern Connecticut State University, New Haven, CT Bachelor of Science, Psychology May 2003 GPA: 3.97/4.0 PROFESSIONAL LICENSURE/CERTIFICATION Licensed Specialist in School Psychology – Texas State Board of Examiners of Psychologist August 2009 – present Certified School Psychologist – Pennsylvania Department of Education January 2008 - present EMPLOYMENT EXPERIENCES Lewisville Independent School District, Lewisville, TX Licensed Specialist in School Psychology 8/2009 – present Pre-doctoral Psychology Intern (APA Accredited Internship) 8/2008 – 8/2009 The Pennsylvania State University, University Park, PA Graduate Assistant, Penn State Outreach Market Research 7/2007 – 7/2008 Teaching Assistant, SPSY 559 (Cognitive Assessment) 1/2007 – 5/2007
Graduate Assistant, CEDAR Clinic Staff 8/2005 – 5/2007 Research Assistant, Dr. Richard Carlson 5/2005 – 8/2005
Teaching Assistant, IST 210 (Database Management Systems) 1/2005 – 5/2005 The Second Mile, State College, PA Freelance Program Evaluator 2/2006 – 8/2007 Yale University School of Medicine, New Haven, CT
Research Assistant, Substance Abuse Research Center 9/2003 – 8/2004
PROFESSIONAL MEMBERSHIPS National Association of School Psychologists 5/2005 – present American Psychological Association, Student Affiliate 8/2008 – present Texas Association of School Psychologists, Student Member 9/2008 – present Dallas/Fort Worth Regional Association of School Psychologists 9/2008 – present