validity and diagnostic accuracy of scores from the …

188
The Pennsylvania State University The Graduate School College of Education VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE AUTISM DIAGNOSTIC OBSERVATION SCHEDULE-GENERIC A Dissertation in School Psychology by Melissa A. Reid © 2012 Melissa A. Reid Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2012

Upload: others

Post on 06-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

The Pennsylvania State University

The Graduate School

College of Education

VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE AUTISM

DIAGNOSTIC OBSERVATION SCHEDULE-GENERIC

A Dissertation in

School Psychology

by

Melissa A. Reid

© 2012 Melissa A. Reid

Submitted in Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

August 2012

Page 2: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

ii

The dissertation of Melissa A. Reid was reviewed and approved* by the following: James C. DiPerna Associate Professor of Education Professor in Charge of the Program of School Psychology Dissertation Adviser Richard Hazler Professor of Education Robert Steven Professor of Education

Beverly J. Vandiver Associate Professor of Education

*Signatures are on file in the Graduate School

Page 3: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

iii

Abstract

The purpose of this study was to examine the internal structure, relationships with other

variables, and diagnostic accuracy of scores on the Autism Diagnostic Observation Schedule –

Generic (ADOS-G; Lord et al., 1999) for the purpose of diagnostic decision-making. Participants

were 462 children enrolled in a public school district in the southern U.S. who were referred for

a school-based psychoeducational evaluation. Four hypotheses were tested with mixed results.

The first prediction was that items included in the Original Scoring Algorithm (OSA) would

reflect a uni-dimensional construct, and items included in the Revised Scoring Algorithm (RSA)

would reflect two constructs across modules. Exploratory factor analysis confirmed the one-

factor structure of the OSA across modules. However, a two-factor structure was not retained for

the Module 2 or Module 3 RSA. Second, it was predicted that total scores on the ADOS-G,

across modules and scoring algorithms, would demonstrate moderate to strong relations with

scores from other measures of autistic behavior, and weak relations with measures of emotional

functioning. Weak relationships were consistently measured between participants’ scores on the

ADOS-G across modules and algorithms and other measures of autistic and emotional

functioning. Third, it was predicted that scores obtained from application of the RSA would

result in greater diagnostic accuracy than those obtained from the OSA. Receiver Operating

Curve (ROC) analysis was conducted to determine the sensitivity and specificity of ADOS-G

scores. Consistent with hypotheses, the RSA typically resulted in greater diagnostic accuracy,

and a better balance between sensitivity and specificity than did the OSA. Finally, the fourth

hypothesis, which predicted that the diagnostic accuracy of the ADOS-G would be lower with an

independent criterion relative to an interdependent criterion, was not consistently supported. In

general, results of the current study confirm the structural validity and overall diagnostic

Page 4: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

iv

accuracy of the ADOS-G, but also highlight some of the limitations of the instrument. Despite its

limitations, it was concluded that the strengths of the ADOS-G provide support for its continued

use in school-based psychoeducational evaluations for the diagnosis of students with Autism

Spectrum Disorders.

Page 5: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

v

Table of Contents

List of Tables……………………………………………………………………………………viii

List of Appendices………………………………………………………………………………...x

Acknowledgements………………………………………………………………………………xii

Chapter 1. Introduction and Literature Review ...............................................................................1

Definition of Autism Spectrum Disorders……………………………………………………..2

Common Characteristics of Autism Spectrum Disorders……………………………………...3

Assessment and Diagnosis of Autism Spectrum Disorders…………………………………....5

Autism Diagnostic Observation Schedule……………………………………………………..8

Development and Evolution of the ADOS………………………………………………....9

Autism Diagnostic Observation Schedule-Generic…………………………………….....10

Rationale for Present Study…………………………………………………………………..36

Purpose and Hypotheses……………………………………………………………………...37

Chapter 2. Method……………………………………………………………………………….40

Participants…………………………………………………………………………………....40

Measures……………………………………………………………………………………...44

Autism Diagnostic Observation Schedule-Generic……………………………………….44

Gilliam Autism Rating Scale, Second Edition……………………………………………45

Behavior Assessment System for Children, Second Edition……………………………..47

Procedure……………………………………………………………………………………..50

Chapter 3. Results………………………………………………………………………………..53

Preliminary Analyses and Testing of Assumptions…………………………………………..53

ADOS-G Item Analysis…………………………………………………………………...53

Total, Scale, and Subscale Score Analysis………………………………………………..60

Page 6: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

vi

Hypothesis 1: Factor Structure of the Original and Revised Scoring Algorithms……………60

Module 1-Original Scoring Algorithm……………………………………………………63

Module 1-Revised Scoring Algorithm……………………………………………………66

Module 2-Original Scoring Algorithm……………………………………………………70

Module 2-Revised Scoring Algorithm…………………………………………………….71

Module 3-Original Scoring Algorithm……………………………………………………77

Module 3-Revised Scoring Algorithm……………………………………………………79

Hypothesis 2: Relationships between Scores on the ADOS-G and Other Measures………...82

Module 1…………………………………………………………………………………..82

Module 2…………………………………………………………………………………..86

Module 3…………………………………………………………………………………..87

Hypothesis 3: Comparisons of Diagnostic Accuracy Indicators Across Scoring Algorithms..88

Original and Revised Scoring Algorithm Comparisons…………………………………..89

Updated Scoring Algorithms and Optimal Cut-Score Comparisons……………………...92

Hypothesis 4: Diagnostic Accuracy of Independent Clinical Diagnoses…………………….95

Chapter 4. Discussion……………………………………………………………………………99

Structural Validity Evidence………………………………………………………………...100

Module 1…………………………………………………………………………………100

Module 2…………………………………………………………………………………101

Module 3…………………………………………………………………………………102

Convergent and Discriminant Validity Evidence…………………………………………...103

Evidence of Diagnostic Accuracy…………………………………………………………...105

Module 1…………………………………………………………………………………105

Page 7: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

vii

Module 2…………………………………………………………………………………107

Module 3…………………………………………………………………………………109

Independent Clinical Diagnoses…………………………………………………………….111

Summary of Evidence by Module and Scoring Algorithm…………………………………112

Module 1…………………………………………………………………………………112

Module 2…………………………………………………………………………………114

Module 3…………………………………………………………………………………116

Clinical Implications………………………………………………………………………...118

Limitations…………………………………………………………………………………..120

Future Research………………………………………………………………………………123

Conclusions……………………………………………………………………………………..124

References………………………………………………………………………………………126

Footnotes………………………………………………………………………………………..136

Page 8: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

viii

List of Tables

Table 1. Sensitivity and Specificity of Original and Revised Scoring Algorithms by Research Study…………………………………………………………………………………………….23

Table 2. Demographic Characteristics of Total Sample (N = 462) and Independent Clinical Diagnosis Subsample (N = 100)…………………………………………………………………41

Table 3. Item Means, Standard Deviations, Skew and Kurtosis Values on Module 1 from the ADOS-G (N = 82)………………………………………………………………………………..54

Table 4. Item Means, Standard Deviations, Skew and Kurtosis Values on Module 2 from the ADOS-G (N =118)……………………………………………………………………………….56

Table 5. Item Means, Standard Deviations, Skew and Kurtosis Values on Module 3 from the ADOS-G (N = 262)……………………………………………………………………................58

Table 6. Participant’ Means, Standard Deviations, Score Range, Skew, and Kurtosis Values on the ADOS-G, GARS-2, and Selected Subscales from the BASC-2……………………………..61

Table 7. Structure Coefficients and Communalities for the ADOS-G Module 1 (Original Scoring Algorithm) Items (N = 82)……………………………………………………………………….65

Table 8. Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 1 (Revised Scoring Algorithm) Items (N = 66)…………………………………………67

Table 9. Structure Coefficients and Communalities for the ADOS-G Module 1 (Revised Scoring Algorithm) Items (N = 66)……………………………………………………………………….69

Table 10. Structure Coefficients and Communalities for the ADOS-G Module 2 (Original Scoring Algorithm) Items (N = 118)…………………………………………………………….72

Table 11. Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73)………………………………………...74

Table 12. Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73)……………………………………………………………...76

Table 13. Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items (N = 262)…………………………………………………………….78

Table 14. Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items (N = 261)…………………………………………………………….81

Page 9: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

ix

Table 15. Pearson Correlations between Participants’ Total Scores on the ADOS-G Original and Revised Scoring Algorithms for Module 3 and Parent and Teacher Ratings on the GARS-2…………………………………………………………………………………………………..83

Table 16. Pearson Correlations between Participants; Total Scores on the ADOS-G Original, Revised, and Updated Scoring Algorithms and Parent and Teacher Ratings on the BASC-2…..84

Table 17. AUC Values, Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Original and Revised Scoring Algorithm………………………………………………………………………………………...90

Table 18. AUC Values and Optimal Cut-Scores for the ADOS-G Updated and Retained Scoring Algorithms……………………………………………………………………………………….93

Table 19. Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Updated and Retained Scoring Algorithms………………94

Table 20. Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Original and Revised Scoring Algorithm Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100)…………………………..96

Table 21. Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores from the Updated and Retained Original Scoring Algorithms Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100)……………98

Page 10: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

x

List of Appendices

Appendix A: DSM-IV-TR Diagnostic Criteria for Autism Spectrum Disorders………………137

Appendix B: ……………………………………………………………………………………142

Table B1: Activities on the Autism Diagnostic Observation Schedule and their Purpose by Module (Lord, Rutter, DiLavore, & Risi, 1999)…………………………………………….142

Table B2: Items Rated on the Autism Diagnostic Observation Schedule by Subdomain and Module (Lord, Rutter, DiLavore, & Risi, 1999)…………………………………………….149

Table B3: Items Included in the Revised Scoring Algorithm on the Autism Diagnostic Observation Schedule-Generic by Developmental Cell…………………………………….153

Appendix C…………………………………………………………………………………….156

Table C1: Correlation Matrix of Items Included in the ADOS-G Module 1, Original Scoring Algorithm (N = 82)………………………………………………………………………….156

Table C2: Correlation Matrix of Items Included in the ADOS-G Module 1, Revised Scoring Algorithm (N = 66)………………………………………………………………………….157

Table C3: Correlation Matrix of Items Included in the ADOS-G Module 2, Original Scoring Algorithm (N = 118)………………………………………………………………………...158

Table C4: Correlation Matrix of Items Included in the ADOS-G Module 2, Revised Scoring Algorithm (N = 73)………………………………………………………………………….159

Table C5: Correlation Matrix of Items Included in the ADOS-G Module 3, Original Scoring Algorithm (N = 261)………………………………………………………………………...160

Table C6: Correlation Matrix of Items Included in the ADOS-G Module 3, Revised Scoring Algorithm (N = 262)………………………………………………………………………...161

Appendix D…………………………………………………………………………………….162

Table D1: Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOG-G Original Scoring Algorithm One-Factor Solutions…………………………..162

Table D2: Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOG-G Revised Scoring Algorithm One-Factor Solutions…………………………...164

Table D3: Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOG-G Revised Scoring Algorithm Two-Factor Solutions…………………………..166

Appendix E……………………………………………………………………………………..168

Page 11: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

xi

Table E1. Structure Coefficients and Communalities for the ADOS-G Module 1 (Original Scoring Algorithm) Items with Deletion of Item A-5 (N = 82)……………………………..168

Table E2: Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items with Deletion of Item D-2 (N = 73)……………………………..169

Table E3: Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items with Deletion of Item A-4 (N = 262)……………………………170

Table E4: Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items with Deletion of Items D-1 and D-2 (N = 261)…………………171

Appendix F……………………………………………………………………………………..172

Table F1: Cut Scores Used for ADOS-G Classification Determinations by Module and Scoring Algorithm…………………………………………………………………………...172

Table F2: Sensitivity and Specificity Values of Scores on the Original Scoring Algorithm from the Current Sample and Lord et al.’s (1999) Original Sample………………………...173

Table F3: Sensitivity and Specificity Values of Scores on the Revised Scoring Algorithm from the Current Sample and Previous Studies……………………………………………. 174

Appendix G. Curriculum Vitae………………………………………………………………...175

Page 12: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

xii

Acknowledgements

There are many people who have assisted me throughout the process of completing my

graduate education and my dissertation that deserve thanks for their efforts. First, I want to thank

Dr. James DiPerna, my adviser and dissertation chair, for all of his guidance, encouragement,

and faith over the last eight years. I sincerely thank you Jim for not giving up on me, even when I

had given up on myself. I truly appreciate all you have done and know that I would not be

writing acknowledgements to a completed dissertation without you. I would also like to thank

the other members of my doctoral committee, Drs. Richard Hazler, Robert Stevens, and Beverly

Vandiver, for their feedback over the years and contributions to my dissertation. To my

wonderful graduate school cohort, especially Miranda Freberg, Anne McGinnis, and Erin Meyer,

I never would have survived graduate school without you ladies! Thank you for your

collaboration and friendship over the years.

Thank you to the administrative staff of the Lewisville Independent School District,

Department of Special Education for allowing me to use district data to complete my

dissertation. I’d also like to thank my colleagues in Psychological Services who assisted me with

data collection and evaluation review. Special thanks to Robin Chaney, Jennifer Key, Jill

Littleton, Jessica Martin, Amorette Miller, Linda Pedersen, Shannon Spence, and Kimberly

Ward for providing me with endless support and friendship while I was attempting to “kill Earl”.

Thank you, Linda, for asking me about my dissertation progress each week in supervision,

despite the inevitable outcome, and for always holding me accountable for working on it.

Jennifer, thank you for reminding me that I would have never forgiven myself if I didn’t finish

what I started. You both played a special role in helping me get to the place that I am at today.

Page 13: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

xiii

To my other family and friends who have provided me with love and support throughout

this long journey, your contributions have been greatly appreciated. My greatest thanks are to my

mother, Patricia Reid, to whom this work is dedicated. I owe all that I am and all that I have

achieved to you, and I wish that you were here to share in my greatest accomplishment. I hope

you are looking down on me with pride.

Page 14: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

1

Chapter 1. Introduction and Literature Review

The Autism Diagnostic Observation Schedule-Generic (ADOS-G; Lord, Rutter,

DiLavore & Risi, 1999) is one of the most widely utilized diagnostic instruments in the direct

assessment of the social, communicative, and sensorimotor symptoms of Autism Spectrum

Disorders (ASD) in both clinical and educational settings. Despite its popularity and widespread

use, little independent research regarding the psychometric properties of ADOS-G scores has

been conducted to date. Thus, the purpose of this study was to examine the internal structure,

relationships with other variables, and diagnostic accuracy of ADOS-G scores for the purpose of

diagnostic decision making.

The following literature review begins with a brief overview of Autism Spectrum

Disorders (ASDs) and information on current assessment and diagnostic practices used in the

diagnosis of ASDs. The next section synthesizes existing research regarding the psychometric

properties of ADOS-G scores. This chapter then concludes with the rationale, purpose, and

primary hypotheses for the study.

Relevant studies were identified by searching PsychINFO and PsychARTICLES

databases with “ADOS” as the primary search term. This search yielded 191 studies that

included the ADOS as a key study descriptor. The search was narrowed by selecting only studies

that were published in a peer-reviewed journal, resulting in 166 possible articles for inclusion in

the synthesis. Abstracts were reviewed to identify research studies that examined reliability

and/or validity evidence (e.g., stability of measurement across examiners and/or time and

internal consistency of assessment items; evidence of test structure and diagnostic accuracy) for

ADOS scores as a study objective. If study outcomes were not clearly identified within the

abstract, full text was reviewed for clarification. Based on the abstract review, the vast majority

Page 15: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

2

of the research studies featured the ADOS-G as a diagnostic measure of ASDs rather than

examining the instrument or its’ technical adequacy as a study outcome. As a result, only 17

studies were identified that met the criteria for inclusion in the synthesis.

Definition of Autism Spectrum Disorders

Autism is a general term often used to describe a group of disorders formally called

Pervasive Developmental Disorders (PDDs) and commonly referred to as Autism Spectrum

Disorders (ASD). ASDs can be defined as “cognitive and neuro-behavioral disorders, including,

but not limited to, three core-defining features: impairments in socialization, impairments in

verbal and nonverbal communication, and restricted and repetitive patterns of behaviors”

(Filipek et al., 1999, p. 439). In a recent report published by the Center for Disease Control and

Prevention (CDC; 2009), it was noted that ASDs affect approximately 1 in 110 children in the

United States. Symptoms of ASDs, which often include deficits in the use and understanding of

verbal and nonverbal communication, literal and repetitive patterns of thought, and sensory

processing deficits (Autism, n.d.), are typically present from birth or very early in development.

However, diagnosis often does not take place prior to the age of 2 years (Lord et al., 2006).

First reported by Kanner in 1943 as a “syndrome of autistic disturbances”, ASDs were

initially identified in case histories of children between the ages of 2 and 8 years that shared

“unique and previously unreported patterns of behavior, including social remoteness,

obsessiveness, stereotypy, and echolalia” (Filipek et al., 1999, p. 442). Although included in the

first and second editions of the Diagnostic and Statistical Manual for Mental Disorders (DSM;

American Psychiatric Association, 1952; 1968), ASDs were characterized as “psychotic

reactions in children, manifesting primarily autism” and were classified as “schizophrenic

reaction or schizophrenia, childhood type” (American Psychiatric Association, 1968, p. 28).

Page 16: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

3

However, following the publication of the Diagnostic and Statistical Manual of Mental

Disorders, Third Edition (DSM-III; American Psychiatric Association, 1980), ASDs were

reclassified and reconceptualized. The term Pervasive Developmental Disorder (PDD) was first

introduced in the DSM-III, as was the differentiation between ASD and childhood schizophrenia

and other forms of psychoses (Filipek et al.). The terms Autistic Disorder and Pervasive

Developmental Disorder Not Otherwise Specified (PDD-NOS) were introduced in the

Diagnostic and Statistical Manual of Mental Disorders, Third Edition-Revision (DSM-III-R;

American Psychiatric Association, 1987).

According to the current Diagnostic and Statistical Manual for Mental Disorders, Fourth

Edition-Text Revision (DSM-IV-TR; American Psychiatric Association, 2000), there are five

distinct ASDs or PDDs: Autistic Disorder, Asperger’s Disorder, Rett’s Disorder, Childhood

Disintegrative Disorder, and Pervasive Developmental Disorder Not Otherwise. The diagnostic

criteria for each disorder, as listed within the DSM-IV-TR, are listed in Appendix A.

Common Characteristics of Autism Spectrum Disorders

As is evident from their definition and diagnostic criteria, ASDs affect essential human

behaviors such as social interaction, communication, imagination, and establishing relationships,

which typically result in life-long effects on learning, interpersonal interactions, independence,

and level of participation in the community (Autism, n.d.). According to the National Research

Council (2001), the level of impairment experienced by an individual with an ASD varies

according to their age of onset and the severity of their symptoms, as well as the absence or

presence of co-morbid psychiatric disorders. Across and within individuals, the manifestation of

an Autism Spectrum Disorder can vary over time: there is no single behavior that is always

typical or present in individuals with ASDs.

Page 17: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

4

There are, however, several common behavioral characteristics that often are observed in

individuals with Autism Spectrum Disorders. First, speech and language difficulties, as well as

deficits in the use and understanding of nonverbal communication, are typically observed in

individuals on the spectrum. Although the severity of communication impairment varies across

the Autism Spectrum Disorders, all individuals with ASDs exhibit some of the following

behaviors: deficits in verbal language, such as failing to speak, repeating words or phrases heard,

and/or talking repetitively about one topic; atypical pitch, tone, prosody, and/or volume of

speech; failure to use spoken and body language to communicate; does not appear to be listening,

even when spoken to directly; and does not use nonverbal communication methods, such as

gesturing or pointing. In addition to expressive language deficits, individuals on the spectrum

also often experience difficulties with receptive language, or language comprehension (National

Research Council, 2001).

Cognitive and perceptual impairments also are often observed in individuals with Autism

Spectrum Disorders. Specifically, individuals on the spectrum often exhibit a here-and-now way

of thinking, which is typically very literal and repetitive in nature. They often demonstrate a lack

of curiosity about their environment and surroundings, and, at times, fail to attend to important

stimuli, focusing on irrelevant stimuli instead. An obsessive desire for sameness and repetition

may also be observed (National Research Council, 2001).

Deficits in reciprocal social interactions are the hallmark characteristics of all ASDs and

a variety of social deficits are typically observed in individuals on the spectrum. Common social

atypicalities include: resistance to being touched or held, failure to respond to name, an inability

to relate to peers and adults in an ordinary way (e.g., ignores or avoids people), failure to

Page 18: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

5

appropriately modulate eye contact, lack of use of social smiling, and a general lack of

understanding of how other people think, feel, or view the world.

In addition to communication, cognitive and perceptual, and reciprocal social

impairments, individuals with ASDs also typically exhibit some degree of sensory processing

deficits and engagement in stereotyped behaviors. For example, those on the Spectrum may

exhibit extreme fear reactions to loud noises, strangers, new situations, changes, or surprises;

may be under- or over-responsive to physical pain; and may demonstrate distinct food and

clothing preferences. Further, individuals with spectrum disorders may rock or spin objects as a

form of self-stimulatory behavior, may require compulsive adherence to specific routines, may

become preoccupied with one or a few objects, and may tantrum or exhibit other aggressive

behaviors when upset (National Research Council, 2001).

Assessment and Diagnosis of Autism Spectrum Disorders

Although there are clearly defined diagnostic criteria, difficulties exist in the diagnosis of

ASDs. Despite being neurological in nature, the neuro-physiological markers of ASDs have not

yet been clearly identified or documented. As a result, physicians, psychologists, and other

professionals charged with diagnosing ASDs are required to rely on a child’s observable patterns

of behavioral functioning in order to make a diagnosis (Lord & Risi, 1998). Reliance on

observable symptoms, however, can be challenging for several reasons. First, the symptoms of

autism /ASDs can differ dramatically across individuals and within individuals across time

(Lord, 2010; Tsai, 1992). Significant symptom overlap between Autistic Disorder and the

various ASDs can make differential diagnosis between disorders quite difficult, especially in

younger and older individuals (Lord & Risi, 1998; Lord & Volkmar, 2002). Further, symptom

overlap between ASD’s and other physiological and psychological conditions, such as mental

Page 19: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

6

retardation, other developmental disabilities, expressive and receptive language disorders,

Attention-Deficit/Hyperactivity Disorder (ADHD), and childhood-onset schizophrenia also

complicates differential diagnosis (American Psychiatric Association, 2000; Ghaziuddin, 2005;

Reaven, Hepburn, & Ross, 2008).

Due to the complexity of diagnosis, a multi-disciplinary approach to the diagnostic

assessment of Autism Spectrum Disorders is recommended (Filipek et al., 1999). Filipek et al.

recommended that each diagnostic evaluation should include a number of components, including

a comprehensive interview with parents and other caregivers in which a complete birth, medical,

family, and developmental history is obtained; direct observations of and interactions with the

child being assessed; assessment of the child’s adaptive and general behavioral functioning, and

direct assessment of the child’s speech/language/communication skills, cognitive functioning,

sensorimotor functioning, and academic functioning. Use of measures that are designed

specifically for the screening and diagnosis of ASDs are also strongly recommended (Filipek et

al., Risi et al., 2006).

Several specific screening and diagnostic measures for ASDs are widely used by

researchers and clinicians in the process of completing a multidisciplinary autism evaluation.

Two of the most commonly used rating scales at this time are the Childhood Autism Rating

Scale (CARS; Schopler, Reichler, & Rochen Renner, 1988) and the Gilliam Autism Rating Scale

(GARS; Gilliam, 1995). Although authors for both assessments have indicated that their scores

possess adequate reliability and validity for screening (CARS) and diagnostic (GARS) decisions

(Gilliam; Schopler et al.), independent research has raised some questions regarding the

usefulness and diagnostic accuracy of these assessments. Specifically, Lord and Risi (1998)

noted that the CARS does not effectively differentiate individuals with communication deficits

Page 20: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

7

and cognitive and behavioral difficulties related to autism from examinees with expressive

language delays, cognitive impairments, and general behavioral difficulties that are not due to a

pervasive developmental disorder.

In their investigation of the discriminative ability and diagnostic utility of the GARS,

Mazefsky and Oswald (2006) determined that the measure does not accurately discriminate

children with autism from those with non-developmental disabilities. A 2008 study conducted by

Sikora, Hartley, McCoy, Gerrard-Morris, and Dill confirmed the instrument’s failure to

consistently discriminate examinees on the autism spectrum from those that are not. More

concerning, however, was Mazefsky and Oswald’s conclusion that the GARS systematically

underestimates the probability that examinees are on the autism spectrum. A previous study of

the GARS conducted by South et al. (2002) presented similar concerns with the diagnostic

accuracy of the instrument.

In 2006, a second edition of the GARS was published by the test author (GARS-2;

Gilliam). In an attempt to address the systematic concerns of the GARS raised by independent

researchers, the GARS-2 was created with a new normative sample of participants (Montgomery,

Newton, & Smith, 2008). Substantial revisions were made to the instrument, including the

elimination of one of the four subscales found within the measure and the introduction of an

interview component to allow for the evaluation of the child’s development during early

childhood (Gilliam, 2006). Independent research on the technical adequacy of the GARS-2 has

yet to be completed.

Another popular autism diagnostic measure is the Autism Diagnostic Interview-Revised

(ADI-R; Lord, Rutter, & LeCouteur, 1994). The ADI-R is a standardized comprehensive

interview that can be completed with parents/primary caregivers and, consistent with the

Page 21: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

8

recommendations of Filipek et al. (1999), requests information about the child’s birth, health,

and developmental history. It is designed for use with caregivers of children under evaluation

who demonstrate a developmental level of at least 2 years, 0 months of age.

Validation studies completed by the authors indicate that scores from the ADI-R reliably

and validly diagnose autism in children and adolescents (Rutter, LeCouteur, & Lord, 2003).

Independent research has also confirmed its technical adequacy (Cicchetti, Lord, Koenig, Klin,

& Volkmar, 2008; Noterdaeme, Mildenberger, Sitter, & Amorosa, 2002; Papanikolaou et al.,

2009). However, concerns regarding the ADI-R have also been documented. Ventola et al.

(2006) noted that the typical length of time required for appropriate administration of the ADI-R

(i.e., 90 to 150 minutes; Rutter, LeCouteur, & Lord) is prohibitive and may make it impractical

for use in school-based evaluations. In addition, unlike other diagnostic assessment currently in

use, the ADI-R does not differentiate between Autistic Disorder and other ASDs (LeCouteur,

Haden, Hammal, & McConachie, 2008).

Autism Diagnostic Observation Schedule

Perhaps the most widely used diagnostic assessment of autism, also considered the

current “gold standard” in autism assessment (Kline-Tasman, Risi, & Lord, 2007), is the Autism

Diagnostic Observation Schedule-Generic (ADOS-G; Lord, Rutter, DiLavore & Risi, 1999).

Designed for use with individuals who are thought to have an ASD, the ADOS-G is a

standardized assessment of communication, social interaction, play/imagination, and stereotyped

behaviors and interests. The original ADOS was designed to provide researchers and clinicians

with a standardized tool that could be used to record a child or adolescent’s social and

communicative behavior throughout the course of a comprehensive evaluation for an Autism

Spectrum Disorder. Since the time of its initial release, the ADOS has evolved in order to be

Page 22: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

9

used with a broader range of examinees, both in terms of age and expressive language level, and

in a variety of settings (DiLavore, Lord, & Rutter, 1995), and in order to provide more consistent

differential diagnosis between children and adolescents on the autism spectrum and those with

other developmental disabilities that are not on the spectrum (Lord et al., 1999). Published in

1999, the most current version of the ADOS is the Autism Diagnostic Observation Schedule-

Generic (ADOS-G; Lord et al., 1999).

Development and evolution of the ADOS. First published in 1989, the Autism

Diagnostic Observation Schedule (ADOS; Lord et al.) was intended to be used in the differential

diagnosis of ASDs from other disorders, such as mental retardation, and typical childhood

development. It also was designed as a research tool to directly study the social behaviors and

communication patterns found in individuals with ASDs.

At the time of its initial release, the ADOS was unique from other scales in two primary

ways (Lord et al., 1989). First, unlike other diagnostic measures of autism available at that time,

the ADOS was designed to focus examiners’ observations on clients’ social and communicative

functioning to identify the presence or absence of behaviors that are specific to autism. In

addition, the ADOS also provided examiners with specific administration directions to guide

their own behavior in conjunction with the behavior of their examinees (Lord et al.).

Despite its advances, the original ADOS was limited because it could only be utilized

with examinees between the ages of 5 and 12 whose expressive language skills were, at a

minimum, developmentally consistent with those of a 3-year-old child (Lord et al., 2000).

However, individuals with autism frequently exhibit delays and deficits in all areas of language

acquisition, including receptive, expressive, and pragmatic (social) language. Further, the

majority of children are under 5 years of age when first referred for an autism assessment (Lord

Page 23: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

10

et al.). Administration time of the ADOS was also lengthy due to its large number of items, and

completion of the assessment was often problematic for examiners, especially with younger and

more impaired children (Lord et al., 2000).

In an attempt to address these limitations, DiLavore, Lord, and Rutter (1995) developed

the Pre-Linguistic Autism Diagnostic Observation Scale (PL-ADOS), which was a downward

extension of the ADOS for use with verbal children between the ages of 2 and 4 and with

examinees of any age who do not exhibit spontaneous expressive language. Thus, the

combination of the PL-ADOS and ADOS increased the overall utility of the instrument system

by broadening the range of individuals with whom the ADOS could be used.

Limitations remained with the ADOS and PL-ADOS, however. Most notably, research

indicated that the PL-ADOS was not able to accurately differentiate between Autism Spectrum

Disorders and non-spectrum developmental delays in children of preschool age (Lord et al.,

2000). In addition, the ADOS did not include normative data for individuals above the age of 12,

and its’ items and activities were not developmentally appropriate for adolescents and adults. In

response to these needs, an updated measure (ADOS-G, Lord et al., 1999) was published in 1999

and is still in use today.

Autism Diagnostic Observation Schedule-Generic. The ADOS-G was superior to its

predecessors in several significant ways. As a replacement for both the ADOS and the PL-

ADOS, the instrument was designed for use with individuals across the lifespan. Instead of

consisting of a standard pool of items that is to be administered to all examinees (as was found in

the original ADOS and PL-ADOS), the ADOS-G is composed of a set of modules including

assessment activities that are appropriate for use with the individuals for whom the module was

designed. Modules were designed with consideration of both the chronological age and verbal

Page 24: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

11

fluency of the examinee in order to minimize the potential bias of expressive language ability on

performance, as was observed in previous iterations of the instrument (Lord et al., 2000). In

addition, across the modules, scoring determinations are based on deviations from the

expectations of abilities given the examinee’s expressive language level in order to better

differentiate the social and communication difficulties that are related to language ability versus

other developmental concerns (Lord et al.). Unlike the standardization samples used for

normative comparisons of performance on the ADOS and the PL-ADOS, which only included

individuals with Autistic Disorder, the standardization sample for the ADOS-G included

individuals with Autistic Disorder, Asperger’s Disorder, and Pervasive Developmental Disorder

Not Otherwise Specified, allowing for the comparison of a participant’s performance to those

with a range of PDDs (Lord et al.).

The ADOS-G consists of four modules. Only one module is administered to an examinee

during a comprehensive evaluation. Each module includes items from four subscales:

Communication, Reciprocal Social Interaction, Play/Creativity/Imagination, and Stereotyped

Behaviors and Restricted Interests. However each ADOS-G module is unique in its item

composition. Module 1 was designed for non-verbal examinees or for those that do not

consistently use spontaneous phrase speech (Lord et al., 2000). It is composed of 10 activities

(see Table B1 for a list of assessment activities by module), which result in ratings on 29

dimensions of functioning (see Table B2 for a list of rated dimensions by subscale for each of the

four modules). Module 2 was designed for use with “verbally fluent” (i.e., individuals who

“produce a range of flexible sentence types, provide language beyond the immediate context, and

describe logical connections within a sentence”) young children or older children who exhibit

some spontaneous phrase speech, but who are not “verbally fluent” (Lord, Rutter, DiLavore, &

Page 25: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

12

Risi, 1999, p. 5). It is composed of 14 activities which are rated on 28 dimensions of functioning.

Older children and younger adolescents with regular use of fluent, spontaneous phrase speech are

administered Module 3, which is comprised of 13 activities and results in ratings on 28

dimensions of functioning. Module 4, designed for use with older adolescents and adults with

fluent expressive language abilities, is composed of 10 required and 5 optional activities that lead

to ratings on 31 dimensions of functioning. Unlike the other modules, the required activities in

Module 4 are not play-based and, instead, are comprised of a series of interview questions (Lord

et al.). According to Lord et al., the activities of Modules 1 and 2 are designed to allow for a

flexible, active assessment administration, whereas the administration of Modules 3 and 4 is

more structured.

Technical development of the Original Scoring Algorithm of ADOS-G. According to

Lord et al. (1999), items included in the Original Scoring Algorithm for each module were

selected from a larger pool of items included in the original version of the ADOS (Lord et al.,

1989) and the PL-ADOS (DiLavore et al., 1995) that assessed aspects of the DSM-IV/ICD10

diagnostic criteria for Autism Spectrum Disorders. From the initial pool, items were examined

for suitability. In addition, those that did not demonstrate adequate interrater reliability (i.e., r >

.80) and/or consistently result in scoring differences between participants with ASDs and those

without were discarded as potential scoring algorithm items. The remaining item pools were

submitted to exploratory factor analysis to further eliminate items that were outliers or that

demonstrated strong correlations to mental or chronological age (Lord et al.). Finally, ROC curve

analyses were conducted on the retained items to determine appropriate cut-scores for non-

Autism ASD and Autism classifications. Some items that “contributed to the possible assessment

Page 26: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

13

of improvement over time” (p. 113) or that assessed behaviors of particular clinical importance

were retained on the instrument but not included in the final scoring algorithm.

Reliability evidence for scores from the ADOS-G Original Scoring Algorithm. Based on

the information provided by test authors in the administration manual (Lord et al., 1999), the

ADOS-G Original Scoring Algorithm consistently and accurately measures the symptoms and

characteristics of Autistic Disorder and non-autism Autism Spectrum Disorders, and

differentiates those with spectrum disorders from those without, and those with Autistic Disorder

from those with non-autism ASDs. Reliability analyses were conducted on individual items,

domain scores, and classification determinations. Item inter-rater reliabilities (i.e., kappa

coefficients) ranged from .55 to 1.0 for Module 1 (mean percent agreement = 91.5%), .48 to .93

for Module 2 (mean percent agreement = 89%), .46 to 1.0 for Module 3 (mean percent

agreement = 88.2%), and .41 to .93 for Module 4 (mean percent agreement = 88.25%). Inter-

rater reliability coefficients for the Social Interaction domain ranged from .88 to .97 across

modules, from .74 to .90 for the Communication domain across modules, and from .84 to .98

across modules for the Communication + Social Interaction Total used for diagnostic

classification determinations. Inter-rater agreement in diagnostic classifications for Autistic

Disorder versus non-spectrum disorders was 90% for Module 4, 91% for Module 2, and 100%

for Modules 1 and 3. Although inter-rater agreement in diagnostic classifications for non-autism

Autism Spectrum Disorders versus non-spectrum disorders was slightly lower (k = .84 to .93)

than observed for Autistic Disorder, it was still measured to be within an acceptable range. Test-

retest reliability coefficients were also reported for the Social Interaction (r = .78) and

Communication (r = .73) domain scores, and for the Communication + Social Interaction Total

score (r = .82) across modules, and interpreted by authors as evidence of “excellent stability” of

Page 27: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

14

measurement (Lord et al., p. 116). In addition, the internal consistency of items within each

domain was assessed (α = .86 to .91 for the Social Interaction domain; α = .74 to .84 for the

Communication domain; α = .47 to .65 for the Stereotyped Behaviors and Restricted Interests)

and determined by authors to indicate good agreement (Lord et al.).

Validity evidence for scores from the ADOS-G Original Scoring Algorithm. Validity

analyses on scores from the ADOS-G Original Scoring Algorithm have been investigated by test

authors and independent researchers.

Structural validity. For each ADOS-G module, an exploratory factor analysis was run to

investigate the structural validity of the items included within the Original Scoring Algorithm.

Authors’ (Lord et al., 1999), reports indicated that, for each module, one major factor emerged,

onto which “almost all items in the Social Interaction and Communication domains loaded

highly” (p.116). However, pattern coefficients and other information regarding factorability were

not provided. Other independent analyses of the structural validity of the ADOS-G Original

Scoring Algorithm have not been conducted to date.

Evidence of diagnostic accuracy. Diagnostic accuracy also was investigated by authors.

For each participant, the diagnostic classification based on his or her Communication and Social

Interaction Total Score on the ADOS-G Original Scoring Algorithm was compared to his or her

clinical diagnosis. Sensitivity and specificity were calculated for each module using Receiver

Operating Characteristic (ROC) curves. Across modules, sensitivity values ranged from .93 to

1.0 and specificity values ranged from .93 to 1.0 when differentiating Autistic Disorder from a

nonspectrum disorder; sensitivity from .90 to .97 and specificity from .87 to .94 when

differentiating all Autism Spectrum Disorders (including Autistic Disorder) from a nonspectrum

Page 28: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

15

disorder; and sensitivity from .80 to .94 and specificity from .88 to .94 when differentiating a

non-autism Autism Spectrum Disorder from a nonspectrum disorder.

Mazefsky and Oswald (2006) also examined the diagnostic utility and discriminative

ability of the ADOS-G Original Scoring Algorithm with a clinical sample of 75 children (ranging

in age from 2 to 8 years) with and without ASDs. Results of the study indicated a 77 percent

agreement between participants’ diagnostic classifications obtained from the ADOS-G and their

clinical diagnoses provided by a multidisciplinary diagnostic team consisting of a child

psychiatrist, clinical psychologist, education specialist, speech/language pathologist, and

occupational therapist.

In addition, Ventola et al. (2006) examined the usefulness of the ADOS-G Original

Scoring Algorithm in diagnosing ASDs in toddlers and young children. Based on their results,

the authors reported that the ADOS-G demonstrates high levels of sensitivity and positive

predictive value when used with toddlers and young children under 3 years of age. In addition,

Ventola et al. indicated that they observed high levels of agreement between the diagnostic

classification determinations of the ADOS-G, the classification determinations of the CARS, and

diagnostic determinations made using the evaluators’ clinical judgments.

The research of Papanikolaou et al. (2009) provides further evidence of the diagnostic

accuracy of the ADOS-G Original Scoring Algorithm. Papanikolaou et al. compared the

diagnostic classification determination of the ADOS-G with the clinical diagnosis of 77 children

ranging in age from 2 to 22 years. According to Papanikolaou et al., results of these comparisons

indicated that participants’ diagnostic classifications on the ADOS-G demonstrated satisfactory

to excellent agreement with participants’ clinical diagnoses (k = .49 - .73). The specificity,

sensitivity, and positive predictive value of the ADOS-G’s diagnostic classifications were also

Page 29: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

16

calculated and examined. Although the specificity (.85 - .95) and sensitivity (.77 - .90) values

were measured to be slightly lower than those reported by Lord et. al. (2000), they were still

deemed to be within acceptable ranges by the authors (Papanikolaou et al).

Additional investigations into the diagnostic accuracy of scores from the ADOS-G

Original Scoring Algorithm provide evidence to support their use in the accurate differentiation

of individuals with ASDs from those with receptive language disorders (Noterdaeme,

Mildenberger, Sitter, & Amorosa, 2002) and other mental health disorders, such as mood and

behavior disorders (Sikora, Hartley, McCoy, Gerrard-Morris, & Dill, 2008). However, according

to Reaven, Hepburn, and Ross (2008), scores derived from the ADOS-G Original Scoring

Algorithm are unable to accurately differentiate between children with an ASD and those with

active psychosis.

Research has also been conducted to investigate the agreement between a participant’s

ADOS-G Original Scoring Algorithm diagnostic classification and his or her diagnostic

classification on the ADI-R. Le Couteur, Haden, Hammel, and McConachie (2008) examined the

percent agreement between the diagnostic classifications of scores on the two instruments in a

sample of 101 preschoolers. Results of this study indicated that the ADOS-G and ADI-R scoring

algorithms yielded consistent diagnostic classifications 76 percent of the time (k = .52).

Tomanik, Pearson, Loveland, Lane, and Shaw (2007) also examined the percent agreement

between classification determinations of the ADOS-G Original Scoring Algorithm and the ADI-

R in a sample of 129 children and adolescents. Similar to the results reported by Le Couteur et

al., Tomanik et al.’s results indicated agreement between the ADOS-G and ADI-R 75 percent of

the time.

Page 30: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

17

In summary, although the research conducted to date has adequately demonstrated the

diagnostic accuracy of the Original Scoring Algorithm and has documented acceptable levels of

classification agreement between the scoring algorithms on the ADOS-G and the ADI-R, other

forms of reliability and validity evidence are lacking at this time. Specifically, the literature

review did not yield any studies focused on the internal structure of the measurement tool.

Limitations of the ADOS-G Original Scoring Algorithm. Although research indicates

that scores from the ADOS-G Original Scoring Algorithm demonstrate adequate technical

properties for use and, in general, accurately categorizes examinees’ performance (Gotham, Risi,

Pickles & Lord, 2007), several criticisms of the ADOS-G Original Scoring Algorithm have been

reported in the literature. The ADOS-G authors also have identified several limitations of the

instrument over the last decade. Bishop and Norbury (2002) reported that the ADOS-G Original

Scoring Algorithm often over-classifies individuals with specific language impairments. In a

2004 study conducted by de Bildt et al., the ADOS-G Original Scoring Algorithm demonstrated

lower levels of sensitivity and specificity when used to discriminate individuals with mild mental

retardation from those with an Autism Spectrum Disorder. Gotham et al. also acknowledged

limitations of the ADOS-G Original Scoring Algorithm related to an examinee’s cognitive

ability. Specifically, they noted that the instrument currently does not take developmental

cognitive ability into account when selecting a module for administration or when scoring an

examinee’s performance, which may result in inaccurate diagnostic classifications for those with

lower mental functioning than expected base d on their chronological age. In addition, Gotham et

al. reported that the Original Scoring Algorithm, which utilizes different items across modules,

makes comparisons of performance across modules difficult.

Page 31: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

18

Of additional concern to Gotham et al. (2007) was the Original Scoring Algorithm’s lack

of consideration regarding an examinee’s engagement in restricted, repetitive behaviors (RRB).

Although items to assess RRB are included on the ADOS-G, they were intentionally excluded

from the Original Scoring Algorithm due to the authors’ concern over the short period of time

available to observe these behaviors throughout the ADOS-G administration. However, in a

review of the stability of ASD diagnoses over time, Lord et al. (2006) reported that the inclusion

of RRB in diagnostic determinations, even when only observed in a limited context,

independently contribute to diagnostic stability.

Revised scoring algorithm for the ADOS-G. In response to the current limitations of the

ADOS-G, Gotham et al. (2007) conducted a study to review and make changes to the Original

Scoring Algorithm in order to (a) improve the overall diagnostic accuracy of the instrument, (b)

address the identified concerns regarding the impact of cognitive ability, expressive language

level, and chronological age on an examinee’s performance, (c) include RRB in diagnostic

determinations; and (d) increase consistency of the conceptual items included in the scoring

algorithm across modules to allow for easier comparison of performance across modules.

Data from 1,630 cases (i.e., complete ADOS-G administrations) were used in the study’s

analyses. Data were obtained from 1,139 different participants. An unidentified number of

participants completed more than one ADOS-G administration, and the data from each of the

administrations were included in the analyses as a separate case. Participants ranged in age from

14 to 192 months at the time of ADOS-G administration, and completed the assessment as a part

of a diagnostic evaluation at a mid-western autism/communication disorders clinic or as a

research study participant recruited at several sites across the U.S. Fifty-six percent of

participants had a clinical diagnosis of Autistic Disorder, 27 percent were diagnosed with a

Page 32: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

19

milder Autism Spectrum Disorder, and 17 percent had a diagnosis of a non-ASD developmental

delay. Only data from Module 1, 2, and 3 administrations were included in the analyses due to

the authors’ beliefs that older adolescents and adults on the Autism Spectrum exhibit distinct

behavior patterns and, as such, require separate examination (Gotham et al., 2007).

Technical development of the ADOS-G Revised Scoring Algorithm. When generating the

new diagnostic algorithms, researchers took several steps. First, they looked at the correlations

between total scores on the ADOS-G and chronological age, verbal ability, and mental age of

participants and then divided the sample by chronological age and language ability to create cells

that minimized the correlations between total scores and demographic variables. Once the new

cells were generated, the authors examined individual items within each of the modules and

selected those that best differentiated between clinical diagnoses for inclusion in the new scoring

algorithm. Selected items were also subjected to exploratory multi-factor item response analysis

to investigate factor structure and to organize the items into domains for each of the three

modules. The new models were then examined using confirmatory factor analysis (CFA), and

logistic regression was used to determine the “predictive value” of scores from each of the

identified factors to diagnostic determination. Finally, Receiver Operating Characteristic (ROC)

curve analysis was conducted to determine the sensitivity (i.e., accurate positive classifications,

or the percentage of participants with a clinical disorder that are accurately diagnosed as having

the disorder) and specificity (i.e., accurate negative classifications, or the percentage of

participants without a clinical disorder that are accurately diagnosed as not having the disorder)

of the original and the newly revised scoring algorithms within each of the generated cells

(Gotham et al., 2007).

Page 33: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

20

Means and score distributions obtained using the original and revised scoring algorithms

were examined. In Module 1, because the range of possible scores of non-verbal children was

restricted due to their lack of expressive language, the authors suggested dividing Module 1 into

two algorithms: Module 1 No Words and Module 1 Some Words to correct for the influence of

expressive language level on performance. In Module 2, correlation analysis revealed that there

was a consistent inverse relationship between chronological age and ADOS total scores (i.e., as

participant age increased, total ADOS score decreased) in participants under the age of 5 years,

and a direct positive correlation between age and score (i.e., as participant age increased, total

ADOS score increased) in participants age 5 years and older. As such, authors recommended

splitting Module 2 into two algorithms: Module 2 Younger than 5, and Module 2 Greater than or

Equal to 5, to correct for the effect of age on performance. Participants’ scores on Module 3 did

not appear to be highly correlated to any demographic variables so no division of module was

required (Gotham et al., 2007).

Structural validity. Exploratory factor analysis was completed on each of the five cells to

examine the structural validity of the items included in the Revised Scoring Algorithm. Across

the cells, a two-factor model was retained for interpretation. All items loaded saliently (i.e., >

.30) on one of the two factors across the five cells, and factors were significantly positively

correlated. Confirmatory factor analysis was used to determine if a 2-factor model fit the data

better than a 1-factor model. Authors reported that the Comparative Fit Index (CFI) values for

the 2-factor model ranged from .94 to .97 (CFI values greater than .90 indicate a good fit;

Skrondal & Rabe-Hesketh, 2004), and the Root Mean Square Error Approximation (RMSEA)

values ranged from .08 to .09, suggesting an adequate model fit. Gotham et al. (2007) reported

that the 2-factor model produced a “substantially better fit than the 1-factor model” (p. 618),

Page 34: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

21

although no corroborating data were presented. Thus, a 2-factor model was accepted by the

authors. Factors were labeled Social Affective (SA) domain and Restricted-Repetitive Behavior

(RRB) domain, representing the items that loaded onto each of the factors (Gotham et al.; see

Table B3 for a list of items included by factor for each of the modules).

Evidence of diagnostic accuracy. ROC curve analysis was used to estimate the sensitivity

and specificity of the original scoring algorithm, the revised SA + RRB algorithm, and the

revised SA factor only in diagnostic determination. Classifications from each of the three

algorithms were compared to the clinical diagnosis provided for each participant following the

completion of the evaluation process (see Table 1 for sensitivity and specificity values by

algorithm).

In general, the Revised Scoring Algorithms retained the high levels of sensitivity

demonstrated by the Original Scoring Algorithms across each of the 5 modules. Increases in

sensitivity were observed when differentiating an ASD from a non-spectrum disorder in young

children with the revised algorithm. In addition, inclusion of the RRB factor improved the

predictive validity of the ADOS-G when classifying individuals with ASDs from those with non-

spectrum disorders (Gotham et al., 2007).

Given the results, Gotham et al. (2007) concluded that the revised scoring algorithm is a

useful option when interpreting an individual’s performance on the ADOS-G. In addition to the

increased sensitivity and predictive validity observed with some sub-groups of participants in the

sample, the revision of the algorithm into developmental cells also helps to lessen the effects of

verbal ability and age on participant performance and makes the items included in the scoring

algorithm more consistent across modules. However, the authors cautioned that more research is

needed given the study’s limitations (which included a small number of participants in the

Page 35: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

22

sample without an Autistic Disorder or Autism Spectrum Disorder diagnosis, and the

interdependence of ADOS-G classification and resulting clinical diagnosis).

Page 36: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

23

Table 1

Sensitivity and Specificity of Original and Revised Scoring Algorithms by Research Study

Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum Disorder

Original

Algorithm SA + RRB

Revised SA Only Revised

Original Algorithm

SA + RRB Revised

SA Only Revised

Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.

Gotham, Risi, Pickles, & Lord (2007)

1, No Words

.99 .55 .96 .72 .93 .72 .92 .37 .89 .49 .85 .43

1, Some Words

.88 .96 .97 .91 .91 .93 .67 .84 .77 .82 .75 .79

2, < 5 .97 .93 .98 .93 .95 .97 .76 .70 .84 .77 .80 .63

2, > 5 .96 .97 .98 .90 .92 .97 .86 .77 .83 .83 .72 .77

3 .86 .89 .91 .84 .85 .87 .68 .77 .72 .76 .61 .78

Gotham et al. (2008)

1, No Words

.89 .78 .86 .80 NA NA NA NA NA NA NA NA

1, Some Words

.73 .94 .89 .91 NA NA 1.0 .80 .95 .75 NA NA

(table continues)

Page 37: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

24

Table 1 (continued) Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum

Disorder

Original Algorithm

SA + RRB Revised

SA Only Revised

Original Algorithm

SA + RRB Revised

SA Only Revised

Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.

Gotham et al. (2008)

2, < 5 .85 1.0 .94 1.0 NA NA .88 1.0 .65 1.0 NA NA

2, > 5 NA NA NA NA NA NA NA NA NA NA NA NA

3 .72 .96 .82 .92 NA NA .49 .89 .60 .88 NA NA

Gray, Tonge, & Sweeney (2008)

1, No Words

NA NA .98 .82 .98 .73 NA NA .92 .86 .94 .86

1, Some Words

NA NA .89 .86 .88 .89 NA NA .78 .92 .76 .96

de Bildt et al. (2009)

1, No Words

NA NA NA NA NA NA NA NA NA NA NA NA

1, Some Words

.82 .88 .92 .71 .90 .71 .86 .63 .86 .63 .86 .54

2, < 5 NA NA NA NA NA NA NA NA NA NA NA NA (table continues)

Page 38: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

25

Table 1 (continued)

Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum Disorder

Original

Algorithm SA + RRB

Revised SA Only Revised

Original Algorithm

SA + RRB Revised

SA Only Revised

Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.

de Bildt et al. (2009)

2, > 5 .63 .92 .88 .76 .80 .82 .56 .64 .53 .62 .62 .48

3 .73 .84 .87 .73 .82 .73 .64 .67 .68 .63 .70 .66

Oosterling et al. (2010)

1, No Words

.90 .90 .83 .80 .81 .80 .88 .60 .76 .70 .82 .80

1, Some Words

.52 1.0 .69 .98 .62 .98 .43 .89 .50 .86 .58 .81

2, < 5 .44 1.0 .71 .93 .62 .97 .37 .97 .41 .83 .54 .73

2, > 5 .21 1.0 .57 .90 .50 .98 .45 .93 .64 .85 .73 .83

Molloy et al. (2011)

1, No Words

.91 .65 .82 .65 NA NA .93 .29 .93 .29 NA NA

1, Some Words

.78 ,81 .93 .69 NA NA .94 .56 1.0 .46 NA NA

(table continues)

Page 39: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

26

Table 1 (continued)

Autistic Disorder vs. Non-Spectrum Disorder Autism Spectrum Disorder vs. Non-Spectrum Disorder

Original

Algorithm SA + RRB

Revised SA Only Revised

Original Algorithm

SA + RRB Revised

SA Only Revised

Research Study Module Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec.

Molloy et al. (2011)

2, < 5 .67 .92 .72 .81 NA NA .75 .81 .72 .60 NA NA

2, > 5 .72 .95 .94 .65 NA NA .79 .68 .85 .60 NA NA

3 .77 .72 .92 .55 NA NA .87 .48 .87 .35 NA NA

Page 40: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

27

Additional reliability and validity evidence for the Revised Scoring Algorithm. Since the

publication of the Gotham et al.’s (2007) article, six studies have been conducted to further

investigate the utility of the revised scoring algorithm when classifying an examinee’s

performance on the ADOS-G. Gotham et al. also conducted a second research project in 2008 in

order to attempt to replicate the results of their 2007 study with an independent data set.

In the 2008 study, participants (N= 1259) ranged in age from 18 to 192 months and were

recruited from 11 different sites across the U.S. Similar to the original sample, the majority of

participants (76%) had clinical diagnoses of Autistic Disorder. Consistent with the methods of

the original sample, the current sample was divided into five developmental cells (Module 1 No

Words, Module 1 Some Words, Module 2 Younger than 5, Module 2 Greater than or Equal To 5,

and Module 3). Revised algorithm scores were generated from item scores, and the sensitivity

and specificity of the original and revised algorithms were calculated by developmental cell

using ROC curves. The factor structure of the items included in the revised scoring algorithm

was also investigated by developmental cell and compared to the 2007 sample.

Due to the extremely small number of data points (N=17) within the Module 2 Greater

than or Equal to 5 cell, analyses weren’t conducted on this developmental cell. For the cells

included within the analyses, authors reported the results indicated that the 2-factor model

structure proposed for the revised scoring algorithm items within the 2007 study (Gotham et al.)

also satisfactorily fit the current data. However, negative factor loadings were observed for 2

items within the SA factor and for all items within the RRB factor in the Module 2 Younger than

5 developmental cell (Gotham et al., 2008), calling the suitability of the 2-factor structure into

question. Gotham et al. also reported that a CFA confirmed the satisfactory replication of the 2-

factor model across developmental cells within the current sample, although goodness-of-fit

Page 41: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

28

information was not provided. Sensitivity and specificity values are reported in Table 1. In

general, results indicated that the predictive validity (both the sensitivity and specificity)

improved when the revised algorithm was utilized with the independent sample (Gotham et al.).

The biggest improvements in sensitivity between the original and the revised algorithms were

observed within the Module 1 Some Words cell when differentiating between Autistic Disorder

and a non-spectrum disorder, and within Module 3 when differentiating between an ASD and

non-spectrum disorder. Despite some challenges in the replication of the 2-factor model of the

revised scoring algorithm and sample size limitations that precluded analysis on one of the five

developmental cells, the authors concluded that the revised algorithms “better represent observed

diagnostic features through new domains, increase comparability between modules in algorithm

item content and number, and improve ADOS predictive validity for autism compared to

previous algorithms” (Gotham et al, p. 650).

Also in 2008, Gray, Tonge, and Sweeney conducted a research study designed to evaluate

the diagnostic validity of the ADOS and the Autism Diagnostic Interview-Revised (ADI-R;

Rutter, LeCouteur, & Lord, 2003) in a sample of young children with and without autism.

Although not a primary outcome of the study, Gray et al. examined the sensitivity, specificity,

positive predictive power, and negative predictive power of diagnostic classifications made with

the original and the revised scoring algorithms to determine if significant differences existed

between the two methods.

Australian children (N = 209; ages 20-55 months) served as participants for this study.

All participants had been referred for assessment at an early childhood health agency due to the

suspicion of autism or concerns regarding other developmental problems. All participants were

administered either Module 1 or Module 2 the ADOS-G as a part of a developmental assessment

Page 42: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

29

(also consisting of an assessment of cognitive ability, comprehensive language assessment, and

parent interview with the ADI-R). Following completion of the assessment, all data was

reviewed to arrive at a clinical diagnosis based on DSM-IV-TR criteria (American Psychiatric

Association, 2000). ADOS-G classifications, obtained using both the original and revised scoring

algorithms, were compared to the final clinical diagnosis to investigate the sensitivity and

specificity of the instrument.

Gray et al. (2008) reported that, when using the original scoring algorithm, out of the 209

participants, 18 were inappropriately classified on the ADOS-G as not having Autistic Disorder

or a less severe Autism Spectrum Disorder when they did in fact meet DSM-IV-TR criteria to

warrant a clinical diagnosis of Autistic Disorder or another PDD. In addition, 10 participants

with a final clinical diagnosis of a non-spectrum disorder were inappropriately classified as

having Autistic Disorder or an ASD on the ADOS-G when using the original scoring algorithm.

Similar to Gotham et al. (2007), Gray et al. compared the utility of the revised scoring algorithm

by examinee’s classifications derived from their combined scores on the Social Affective (SA)

and Restricted-Repetitive (RRB) domains, as well as from their scores on the SA domain only

(see Table 1 for sensitivity and specificity values for the original and revised scoring algorithm).

Authors reported that there was a general improvement in sensitivity and efficiency of

classification with the revised algorithms (both SA+RRB and SA only), but lower specificity

was observed across the sample. Lower sensitivity was observed with the revised algorithms,

however, when classifying participants in Module 1 Some Words with an ASD compared to a

non-spectrum disorder. The positive predictive power of the original scoring algorithm (.91 - .96)

and the revised algorithm (.88 - .98) did not differ significantly. However, the revised algorithm

(.67 - .90) demonstrated greater negative predictive power than did the original scoring algorithm

Page 43: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

30

(.64 - .81). Gray et al. also indicated that, across modules, no significant difference in sensitivity

was observed between making classification determinations using the SA+RRB algorithm and

the SA only algorithm. However, with non-verbal children (i.e., participants in the Module 1 No

Words sample), using the SA+RRB revised algorithm as the classification determinant led to the

highest diagnostic accuracy. Based on their mixed results and the lack of other independent

examinations of the efficacy of the revised scoring algorithms, and the interdependency of

ADOS-G and ADI-R scores in consensus clinical diagnoses, the authors concluded that future

research on the revised diagnostic algorithms is necessary.

Overton, Fielding, and de Alba (2008) also set out to compare the differences in

diagnostic classification determinations of the original and the revised scoring algorithms to

determine if the revised algorithm would decrease the incidence of false positive and false

negative ADOS-G classifications with a small sample of students referred for psychoeducational

diagnostic evaluations.

Twenty-six Hispanic children (ranging in age from 20 to 192 months), referred for

school-based evaluations due to concerns regarding the possibility of neurodevelopmental or

psychological disorders, served as participants in this study. All participants were administered

either Module 1, Module 2, or Module 3 of the ADOS-G, depending on chronological age and

expressive-language level. Participants’ performance on the ADOS-G was initially scored using

the original scoring algorithm and was compared to the individual’s overall clinical diagnosis

provided at the conclusion of the evaluation process. At a later date, the same participants’

performance was rescored and classified using the revised scoring algorithm. The accuracy of

participants’ ADOS-G classification made with the revised algorithm was then compared to the

accuracy of classification made using the original algorithm.

Page 44: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

31

When comparing ADOS-G classifications obtained using the original scoring algorithm

with concluding clinical diagnoses, four false positive and one false negative classification were

observed. When the revised scoring algorithm was applied, only one participant’s diagnostic

classification appropriately changed (from Autistic Disorder to Autism Spectrum Disorder). All

other participants’ classifications remained consistent across scoring algorithms (Overton et al.,

2008). Although several limitations of the study, most notably sample size, were noted by the

authors, these results cast some doubt on the superiority of the revised scoring algorithm over the

original scoring algorithm currently in use.

A comparison of the original and revised scoring algorithms was also conducted by de

Bildt et al. (2009). Specifically, these researchers conducted a study to determine how well the

classification determinations derived from the revised scoring algorithms (SA+RRB and SA

only) contribute to a clinical diagnosis of an Autism Spectrum Disorder or non-spectrum

disorder, when compared to the contribution of the original scoring algorithm.

Five-hundred and fifty-eight Dutch children, ranging in age from 13 to 198 months,

served as participants in this study. The majority of participants had a clinical diagnosis of

Autistic Disorder (35%) or unspecified Autism Spectrum Disorder (40%). Participants were

administered Module 1, Module 2, or Module 3 of the ADOS-G as part of an evaluation for

childhood psychiatric problems or when serving as a research participant in an epidemiological

study of Autism Spectrum Disorders in children with mental retardation. Each participant’s

performance on the ADOS-G was scored and classified using the original algorithm, SA only

revised algorithm, and SA+RRB revised algorithm. The sensitivity, specificity, and efficiency

(i.e., the percentage of cases correctly classified; an estimate of the balance between sensitivity

and specificity; de Bildt et al., 2009) of each participant’s ADOS-G classifications compared to

Page 45: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

32

their clinical diagnosis were determined across the three scoring algorithms. Logistic regression

was also conducted to determine the relative contribution of each participant’s ADOS-G

classification by scoring algorithm to their clinical diagnosis (de Bildt et al.).

Results of the study varied across modules (Table 1). In general, however, use of the

revised scoring algorithms resulted in increased sensitivity and decreased specificity of

diagnostic classifications compared to the original scoring algorithm. The efficiency of each of

the scoring algorithms was determined to be comparable in Modules 1 and 2. Both of the revised

scoring algorithms produced higher classification efficiencies than the original scoring algorithm

in Module 3. Data from the regression analyses indicated that the diagnostic classifications of the

original and the SA only revised scoring algorithms contribute approximately equal variance to

clinical classification across modules. However, in Modules 2 and 3, participant’s scores on the

RRB factor were determined to contribute additional variance over and above that accounted for

by the classification made by either the original or the SA only revised scoring algorithms (de

Bildt et al., 2009).

Based on their results, the authors formed several conclusions. First de Bildt et al. (2009)

asserted that utilizing the revised scoring algorithms to make classification determinations helps

to improve the sensitivity and specificity of those classifications in Modules 2 and 3 without

compromising the balance between the two. Consistent with the findings of Lord et al. (2006),

authors also noted that the addition of RRB into the diagnostic scoring algorithm increases the

discriminative power of the ADOS-G with older and higher functioning individuals. Therefore,

de Bildt et al. indicated that the revised scoring algorithm provides advantages over the original

scoring algorithm.

Page 46: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

33

Similarly to Gotham et al. (2008), Oosterling et al. (2010) set out to replicate the results

of the Gotham et al. (2007) initial investigation of the revised scoring algorithm with an

independent sample. The authors aimed to examine whether or not the revised algorithms

improve the overall diagnostic validity of the ADOS-G.

Five-hundred and thirty-two cases, obtained from 426 Dutch participants, were included

in the analyses. Participants ranged in age from 15 to 144 months, and had the following clinical

diagnoses: Autistic Disorder (40%), PDD NOS (25%), Asperger’s Disorder (2%), or non-

spectrum developmental delays (30%). Three percent of participants did not possess a clinical

diagnosis. Each participant was administered either Module 1 or Module 2 of the ADOS-G

(based on their age and expressive-language ability) as a part of a comprehensive diagnostic

evaluation. Participants’ ADOS-G performance was scored and classified using the original and

revised (SA only and SA + RRB) scoring algorithms. Sensitivity, specificity, correct

classification rate, positive predictive value, and negative predictive value were calculated for the

original and the revised algorithms in relation to each participant’s clinical diagnosis to

determine diagnostic accuracy. A confirmatory factor analysis (CFA) was also completed to

determine the goodness-of-fit of the factor structure of the revised algorithm.

Sensitivity and specificity values for the original and revised scoring algorithms (SA only

and SA+RRB) are presented in Table 1. In general, results indicate that use of the revised scoring

algorithms produces a better balance between the sensitivity and specificity of diagnostic

classifications than observed when using the original scoring algorithm. However, the sensitivity

values obtained from determinations made using all three scoring algorithms were unacceptable,

according to the authors (Oosterling et al., 2010). In addition, specificity values obtained from

the revised scoring algorithms were generally higher than those obtained from the original

Page 47: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

34

algorithm. Positive predictive values for the original scoring algorithm (.98 – 1.0 when

differentiating between Autistic Disorder and a non-spectrum disorder; .65 - .94 when

differentiating between non-Autism ASD and a non-spectrum disorder) were slightly higher than

those for the revised algorithm (.67 - .96 when differentiating between Autistic Disorder and a

non-spectrum disorder; .65 - .82 when differentiating between non-Autism ASD and a non-

spectrum disorder). Negative predictive values for the original algorithm (.61 - .78 when

differentiating between Autistic Disorder and a non-spectrum disorder; .53 - .76 when

differentiating between non-Autism ASD and a non-spectrum disorder) were also slightly higher

than those for the revised algorithm (.47 - .86 when differentiating between Autistic Disorder and

a non-spectrum disorder; .51 - .78 when differentiating between non-Autism ASD and a non-

spectrum disorder). Results of the CFA indicated that a 2-factor model appropriately fit the data

in Module 1 No Words, Module 1 Some Words, and Module 2 Younger than 5 (CFI = .96-1.0,

RMSEA = .04-.08), but not in Module 2 Greater than or Equal to 5 (CFI = .87, RMSEA = .14).

Authors also reported that the 2-factor model provided better fit than a proposed 1-factor model,

although no specific goodness-of-fit data was provided (Oosterling et al.).

In the discussion of their results, Oosterling et al. (2010) noted that the sensitivity values

obtained from the current sample were significantly lower, and the specificity values were higher

than the values that have been reported in other studies measuring the diagnostic accuracy of the

revised scoring algorithms. Although all sensitivity values were in the unacceptable range, values

did improve when using the revised algorithms (SA only for ASD classifications, SA+RRB for

Autistic Disorder classifications) for classification determination compared to the original

scoring algorithm. In general, the authors concluded that the revised algorithms offer better

diagnostic validity than the original scoring algorithms. Based on the current data, Oosterling et

Page 48: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

35

al. also indicated that the SA only scoring algorithm results in the greatest diagnostic accuracy

when used to classify the performance of young, low-functioning individuals and older, high-

functioning individuals.

Most recently, Molloy, Murray, Akers, Mitchell, and Manning-Courtney (2011)

examined the sensitivity and specificity of the ADOS-G as it is typically used in a clinical setting

(i.e., as part of an initial diagnostic evaluation to confirm or rule out an Autism Spectrum

Disorder). ADOS-G data from 584 diagnostic evaluations were included in the analyses.

Participants, ranging in age from 26 to 198 months, were administered the ADOS-G in

conjunction with other assessment instruments. Twenty-six percent of participants had clinical

diagnoses of Autistic Disorder, 32% with non-autism ASD, and 44% with non-spectrum

disorders. Participants’ item-scores on the ADOS-G were used to calculate domain scores for

both the original and the revised scoring algorithms, which were then used to make diagnostic

classifications. Following the completion and review of the diagnostic evaluation, participants

were assigned a clinical diagnosis by a psychologist or a developmental pediatrician. Finally,

each participant’s diagnostic classifications from both the original and the revised scoring

algorithm were compared to their clinical diagnosis in order to calculate sensitivity and

specificity.

Sensitivity and specificity values from the original and revised scoring algorithms can be

found in Table 1. When used to classify individuals with Autistic Disorder compared to those not

on the spectrum, the revised scoring algorithm generally produced higher levels of sensitivity

and lower levels of specificity, compared to the original scoring algorithm, across the five

developmental cells. Few differences were observed in the comparison of the sensitivity and

Page 49: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

36

specificity values of the original and revised scoring algorithms when used to classify individuals

with ASD compared to those with non-spectrum disorders (Molloy et al., 2011).

Malloy et al. (2011) also compared current results with the sensitivity and specificity

values of the original and revised scoring algorithms reported in the original study (Gotham et

al., 2007) and by de Bildt et al. (2009). In general, sensitivity and specificity values obtained by

Molloy et al. (2011) were lower than those reported by Gotham et al. and de Bildt et al. across all

of the developmental cells. Although Gotham et al. and de Bildt et al. concluded that the revised

scoring algorithm is superior to the original, Molloy et al. reported that using the revised scoring

algorithm did not improve the predictive value of the ADOS-G. The authors hypothesized that

differences in examiner scoring, clinical decision-making, and sample composition (e.g., there

was a larger percentage of participants without a spectrum disorder in the current study as

compared to the samples analyzed by Gotham et al. and de Bildt et al.) may have contributed to

the differences in diagnostic accuracy reported across studies.

Rationale for Present Study

The ADOS-G is widely used in the evaluation and diagnosis of Autism Spectrum

Disorders across clinical, research, and educational settings. However, little research on the

technical adequacy of the instrument has been conducted to date. The structural validity of the

original scoring algorithm has yet to be replicated, and few independent examinations of the

ADOS-G scores’ relations with other measures have been conducted. Thus, an independent

examination of the technical adequacy of the ADOS-G is necessary and timely.

Further, additional research is needed to determine if the revised diagnostic algorithm

results in greater diagnostic accuracy than the original algorithm. Current evidence regarding the

utility of the revised algorithm is mixed. Although Gotham et al. (2007, 2008) concluded that use

Page 50: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

37

of the revised scoring algorithm increases levels of sensitivity and specificity, other studies have

reported no differences in diagnostic accuracy across scoring algorithms (Overton et al., 2008) or

increases in sensitivity coupled with decreases in specificity (Gray et al., 2008; Molloy et al.,

2011). Comparisons between the positive predictive power and negative predictive power across

algorithms are also missing from much of the available research. In addition, much of the

independent research that has been conducted on the revised scoring algorithm has utilized

samples of international participants (i.e., Dutch and Australian children) or homogeneous

samples (i.e., Hispanic children) within the U.S., calling the generalizability of study results to a

heterogeneous American population into question. Further, no large sample study has been

conducted that examines the utility of the revised scoring algorithm with a school-based sample.

In addition, a consistent limitation identified by several studies (Gotham et al., 2007;

Gotham et al., 2008; Gray et al., 2008) is the interdependency of scores from the ADOS-G in the

determination of diagnostic accuracy. Specifically, to determine the overall diagnostic accuracy

of the instrument, the classification accuracy of ADOS-G scores have been compared to

participants’ end clinical diagnoses, which were made, in part, based upon participants’ scores on

the ADOS-G. Further research is needed to determine the classification accuracy of scores from

the ADOS-G when compared to clinical diagnoses made without information regarding a

participant’s performance on the ADOS-G.

Purpose and Hypotheses

The primary purpose of this dissertation was to examine the validity and diagnostic

accuracy of ADOS-G scores for children and adolescents, ranging in age from 2 years through

17 years. Specifically, three types of evidence were examined as part of this study: structural

Page 51: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

38

validity, relations with other variables, and accuracy of diagnosis. Based on these objectives, the

following research questions and hypotheses were tested in the study.

1a. Does a one-factor model best represent the internal structure of items included in the

ADOS-G Original Scoring Algorithm for each of the modules under investigation?

Hypothesis 1a. Consistent with the authors’ original model (Lord et al., 1999),

items included within the Original Scoring Algorithm of the ADOS-G will reflect

a uni-dimensional construct across modules.

1b. Does a two-factor model best represent the internal structure of items included in the

ADOS-G Revised Scoring Algorithm for each module under investigation?

Hypothesis 1b. Consistent with the findings of Gotham at al. ( 2007) items

included within the Revised Scoring Algorithm of the ADOS-G will reflect two

constructs across modules.

2. Across modules, do total scores on the ADOS-G (Original and Revised Scoring

Algorithms) demonstrate moderate to strong relationships with other measures of

autistic behavior and weaker relationships with measures of other behavioral

characteristics?

Hypothesis 2. Scores on the ADOS-G will demonstrate moderate to strong

relationships with scores from other measures of autistic behavior and weaker

relationships (i.e., weak to moderate correlations) with other measures of

behavioral functioning.

3. Across modules, does use of the Revised Scoring Algorithm result in greater

diagnostic accuracy of ADOS-G total scores than use of the Original Scoring

Algorithm?

Page 52: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

39

Hypothesis 3. Consistent with the findings of Gotham et al. (2007), it is

hypothesized that the revised diagnostic algorithm on the ADOS-G will result in

greater diagnostic accuracy (i.e., correctly identify individuals who are on the

spectrum from those individuals who are not and, for those on the spectrum,

correctly differentiate between Autistic Disorder and non-autism ASD) than the

Original Scoring Algorithm.

4. Will there be differences in estimates of diagnostic accuracy made when comparing

ADOS-G classifications to clinical decisions made with and without scores from the

ADOS-G?

Hypothesis 4. Greater diagnostic accuracy of ADOS-G scores will be observed

when scores are compared to clinical diagnoses made with information from the

ADOS-G as compared to those made without information regarding a

participant’s performance on the ADOS-G.

Page 53: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

40

Chapter 2. Method

Participants

An extant database was utilized to answer the research questions. This database included

582 children who were enrolled in a large, suburban public school district in the southern U.S.

and referred for a school-based psychoeducational evaluation. Because a revised scoring

algorithm was not proposed for Module 4 of the ADOS-G, only participants who were

administered Module 1, Module 2, or Module 3 were included in the analyses, resulting in a final

sample size of 462. At the time data were collected, participants ranged in age from 2-years, 10-

months to 17-years, 9-months (M = 7-years, 3-months). All participants were either previously

diagnosed with, or suspected of having, an Autism Spectrum Disorder at the time of evaluation.

Demographic information for the participants is presented in Table 2. One-hundred of the 462

participants were randomly selected for participation in the independent clinical diagnosis

diagnostic accuracy examination (Hypothesis 4). Participants in this analysis also ranged in age

from 34 to 213 months (M = 93 months). Demographic information for participants included in

the independent clinical diagnosis diagnostic accuracy examination is also presented in Table 2.

Complete ADOS-G item data was available for all 462 participants on one of the ADOS-

G modules. In accordance with the practices of Gotham et al. (2007), participants who were

administered Module 1 were divided into two groups for Revised Scoring Algorithm

comparisons based on expressive language ability (i.e., those who are nonverbal, and those with

some language production)1. Similarly, participants who were administered Module 2 were also

were divided into two groups for Revised Scoring Algorithm comparisons based on age (i.e.,

those who were younger than 5 years at the time of the administration, and those who were 5

years of age or older at the time of the administration)2.

Page 54: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

41

Table 2 Demographic Characteristics of Total Sample (N = 462) and Independent Clinical Diagnosis Subsample (n = 100) Total Sample Independent

Diagnosis

% %

Gender

Male 85 83

Female 15 17

Unavailable <1 N/A

Ethnicity

Caucasian 59 61

Black 10 9

Hispanic 14 12

Asian 9 9

Other 2 3

Unavailable 6 6

Grade At Time of Evaluation

Early Childhood (Not Enrolled in School District) 15 13

Public Preschool in Referring School District 13 11

Kindergarten 17 15

Early Elementary (Grades 1-3) 33 31

Upper Elementary (Grades 4-5) 15 24

(table continues)

Page 55: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

42

Table 2 (continued)

Total Sample Independent Diagnosis

% %

Grade At Time of Evaluation

Middle School (Grades 6-8) 7 5

High School (Grades 9-12) <1 1

Unavailable <1 N/A

Special Education Eligibility

Autism Only 27 16

Speech Impaired Only 9 9

Mental Retardation Only <1 1

Emotional Disturbance Only 2 4

Other Health Impaired Only 4 8

Specific Learning Disability Only <1 1

Autism and Speech Impaired 38 39

Mental Retardation and Speech Impaired 1 3

Other Combination of Eligibilities 13 18

Autism, Mental Retardation, and Speech Impaired 3 0

No Eligibility 2 1

Unavailable <1 N/A

Ending Clinical Diagnosis

Autistic Disorder 26 20

(table continues)

Page 56: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

43

Table 2 (continued)

Total Sample Independent Diagnosis

% %

Ending Clinical Diagnosis

Asperger’s Disorder 13 16

Pervasive Developmental Disorder Not Otherwise Specified

26 23

Attention-Deficit/Hyperactivity Disorder 6 14

Mood Disorder 2 3

Other Disorder 7 4

No Disability 13 20

Unavailable 8 N/A

ADOS-G Module

Module 1 18 15

No Words Revised Scoring Algorithm 4 1

Some Words Revised Scoring Algorithm 14 14

Module 2 26 27

Less Than 5 Years of Age 10 8

Greater Than or Equal To 5 Years of Age 16 19

Module 3 56 58

Note: ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 57: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

44

Measures

Several measures were administered to participants and their parents/teachers as part of a

school-based multidisciplinary team evaluation process.

Autism Diagnostic Observation Schedule – Generic (ADOS-G). The ADOS-G is a

semi-structured, standardized assessment of communication, social interaction, and play or

imaginative use of materials for individuals who are suspected of having autism or another

pervasive developmental disorder. The ADOS-G is hypothesized to assess skills in four domains.

Communication assesses characteristics such as vocalization, idiosyncratic use of words or

phrases, pointing, and gestures. Reciprocal Social Interaction measures behaviors such as eye

contact, facial expressions, shared enjoyment, showing, spontaneous initiation of joint attention,

response to joint attention, and quality of social overtures. Play measures functional play with

objects and imaginative play, and Stereotyped Behaviors and Restricted Interests tap

characteristics such as unusual sensory interest in play materials (e.g., sniffing), complex hand

and finger mannerisms, and repetitive interests or stereotyped behaviors. Communication and

Reciprocal Social Interaction are combined to create a Communication + Social Interaction

Total scale. Cut-off scores for autism and autism spectrum are applied to each scale in

determining the possible presence or lack thereof of an Autism Spectrum Disorder.

The ADOS-G is scored using a diagnostic algorithm that allows for the classification of

examinees into two categories: those who have the social and communication deficits consistent

with a diagnosis of Autism or an Autism Spectrum Disorder and those who do not (Lord et al.,

1999). In order to arrive at this classification, ratings are assigned by examiners for each of the

dimensions of functioning assessed throughout the ADOS administration (see Table 2 in

Appendix B for more information). Examiners score each dimension of functioning using either

Page 58: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

45

a 3-point scale (0 - 2) or a 4-point scale (0 - 3), where a score of 0 represents typical functioning

for the participant’s age and developmental level, and a score of 2 or 3 represents highly atypical

functioning (Lord et al). Next, all item scores of 3 are converted to scores of 2. Finally, the

examinee’s performance on selected items from the Communication and Reciprocal Social

Interaction subscales are summed and then compared against cut-scores for Autistic Disorder

and Autism Spectrum Disorder for each of the subscales and for a total scale score (obtained by

adding a participant’s scores on the Communication subscale with his/her scores on the

Reciprocal Social Interaction subscale). The communication and social items included in the

scoring algorithm vary across modules and are identified in Table B2. Although the ADOS-G

measures an examinee’s engagement in restricted, stereotyped, and/or repetitive behaviors and

imagination/creativity, the ADOS-G’s Original Scoring Algorithm does not utilize these items in

classification determination (Lord et al.).

Gilliam Autism Rating Scale, Second Edition (GARS-2).The GARS-S was selected to

provide convergent validity evidence for scores on the ADOS-G. The GARS-2 (Gilliam, 2006) is

a screening instrument used for the assessment of behavior problems that may be indicative of

autism in individuals ages 3 to 22. Although only one form exists, it is designed for use with

parents, teachers, and/or other caregivers who have had regular, sustained contact with the child

being assessed for at least two weeks time. The GARS is composed of 42 items, which are

divided into three subscales: Stereotyped Behaviors, Communication, and Social Interaction.

Each item is scored on a 4-point scale of frequency (0 = Never Observed, 1 = Seldom Observed,

2 = Sometimes Observed, and 3 = Frequently Observed). Items are summed across subscales,

resulting in raw scores that are converted to standard scores (M = 10, SD = 3). Standard scores

Page 59: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

46

from each of the three subscales are then summed and converted into a full-scale Autism Index,

which has a mean of 100 and a standard deviation of 15.

The Stereotyped Behaviors subscale is composed of 14 items that assess the frequency

with which a child exhibits stereotyped behaviors (e.g., hand/finger flapping or flicking and

spinning), motility disorders (e.g., prancing, toe-walking, and making lunging/darting

movements), and other unique or atypical behaviors (e.g., smells/sniffs “unscented” objects,

vocal self-stimulation, and licks/tastes/attempts to eat inedible objects) (Gilliam, 2006).

The Communication subscale is composed of 14 items that assess the frequency with

which a child exhibits the verbal behaviors (e.g., echoes/repeats words and phrases, repeats

unintelligible sounds, and uses pronouns/I inappropriately) and nonverbal behaviors (e.g., looks

away/avoids looking at a speaker when name is called and uses gestures instead of speech/sign to

obtain objects) that are symptomatic of autism (Gilliam, 2006).

The Social Interaction subscale is composed of 14 items that evaluate the child’s ability

to relate appropriately to people, objects, and events within his or her environment (Gilliam,

2006). Items assess the frequency with which a child responds atypically to typical social

situations (e.g., looks away when someone looks at him or her, looks unhappy when praised, and

looks through people), uses objects in an atypical fashion (e.g., lines up objects in a precise

fashion and becomes upset when the order is disturbed, and uses toys inappropriately), and

responds to his or her environment in an atypical way (e.g., behaves in an unreasonable fearful

manner, and does certain things repetitively or ritualistically).

Reliability and validity evidence provided in the GARS-2 Examiner’s Manual indicates

that it is a technically adequate instrument for the screening and diagnosis of individuals on the

autism spectrum (Gilliam, 2006). Adequate internal consistency (α > .80; Salvia & Ysseldyke,

Page 60: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

47

2004) was reported for each of the three subscales and for the total scale. Test-retest reliability

coefficients for the three subscales (r = .70 - .90) and for the total scale (r = .88) demonstrate the

stability of respondents’ ratings over time on the GARS-2. Subscale and total scale ratings on

the GARS-2 were also compared to the total scale ratings on The Autism Behavior Checklist

(ABC; Krug, Arick, & Almond, 1993) and determined to demonstrate moderate to strong

concurrent relationships (r = .58 - .71).

Although not discussed in the examiner’s manual, the structural validity of the GARS-2

standardization sample was examined by Pandolfi, Magyar, and Dill (2010). Exploratory factor

analysis was conducted on the item data. Inconsistent with the author’s (Gilliam, 2006) three

conceptually-derived subscales, a four-factor solution provided the best overall model fit.

Confirmatory factor analysis confirmed the superiority of the four-factor model (χ2 = 3,039.59, p

< .001; RMSEA = .08; CFI = .91) over the three factor model (χ2 = 4,861.33, p < .001; RMSEA

= .10; CFI = .84). However, authors (Pandolfi et al.) identified several limitations to their study,

including a smaller than preferred sample size (N = 496), the failure to independently confirm

participants’ ASD diagnoses, and the failure to include non-verbal participants in the analyses.

Given their results, the authors concluded that the GARS-2 subscales should be interpreted with

extreme caution because each subscale is possibly measuring multiple constructs. However, they

also indicated that additional research is needed to further evaluate the clinical utility of the

GARS-2.

Behavior Assessment System for Children, Second Edition (BASC-2). Several

subscales from the Behavior Assessment System for Children, Second Edition (BASC-2) were

used to provide convergent and discriminant validity evidence for scores on the ADOS-G. The

BASC-2 (Reynolds & Kamphaus, 2004) is a broadband behavioral rating scale that assesses the

Page 61: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

48

domains of Externalizing Problems, Internalizing Problems, Adaptive Skills, and overall

Behavioral Symptoms in children and young adults aged 2 through 25 years. Each domain

consists of several subscales that assess specific classes of behavior within the larger domain.

Parent and teacher rating scales, and self-report of personality forms exist. Across forms,

respondents rate each item on a 4-point scale of frequency (0 = Never, 1 = Sometimes, 2 = Often,

3 = Almost Always). Items are summed across scales and broad domains, and raw scores are

converted to T scores (M = 50, SD = 10). Parent and teacher ratings on the following subscales

will be included in the analysis.

The Atypicality scale measure’s a child’s tendency to behave in ways that are considered

strange or odd by observers (Reynolds & Kamphaus, 2004). Items primarily focus on the child’s

awareness of his or her typical surroundings and apparent connection to his or her environment.

The scale also includes items that assess the frequency with which the child exhibits behaviors

that are consistent with autism symptomology, such as perseverative thought and behavior, social

disconnectedness, and engagement in stereotyped and repetitive motor mannerisms. According

to test authors (Reynolds & Kamphaus), T-scores in the At-Risk or Clinically Significant range

on the Atypicality scale may be suggestive of a developmental delay or Autism Spectrum

Disorder. Further, in the validation sample, young children’s scores on the Atypicality scale

demonstrated a moderate to strong concurrent relationship (r = .42 for parent reports, .77 for

teacher reports) with scores on the Pervasive Developmental Problems scale on the Achenbach

System of Empirically Based Assessment Child Behavior Checklist (ASEBA CBC; Achenbach &

Rescorla, 2000).

The Withdrawal scale measures a child’s tendency to evade others in order to avoid

social contact, and his or her general level of interest in making contact with others in a social

Page 62: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

49

setting (Reynolds & Kamphaus, 2004). Items on this scale assess the child’s general social

difficulties with peers and his or her engagement in or avoidance of group activities. According

to Reynolds and Kamphaus (2004), the Withdrawal scale assesses a “core symptom of autism”

(p. 63) and, as such, scores in the At-Risk or Clinically Significant range on this scale provide

support to consider the possibility of an ASD. Similar to the Atypicality scale, moderate

concurrent relationships were observed between parent and teacher ratings on the Withdrawal

scale and their ratings on the Pervasive Developmental Problems scale on ASEBA Child

Behavior Checklist (r = .49 for parent reports, .57 for teacher reports).

Participants’ scores on the Anxiety scale will be used to examine discriminant validity

evidence for scores on the ADOS-G. The Anxiety scale measures a child’s tendency to be

nervous, fearful, or worried about real or imagined problems. Items on this scale assess the

child’s level of perfectionism, education-related fears, and social worries.

In general, evidence from the validation sample indicates that the BASC-2 is a

technically adequate tool for measuring behavioral functioning in children and adolescents

(Reynolds & Kamphaus, 2004). Adequate internal consistency (α > .80; Salvia & Ysseldyke,

2004) has been reported for all scale composites on both the parent and teacher rating scales for

children and adolescents age 4-years and above. Test-retest reliability estimates on the Teacher

Rating Scales (TRS; r = .72 - .93 on the preschool form, r = .65 - .94 on the child form, and r =

.66 to .91 on the adolescent form) and the Parent Rating Scales (PRS; r = .66 - .88 on the

preschool form, r = .65 - .92 on the child form, and r = .72 to .92 on the adolescent form) reflect

an adequate to strong consistency of ratings over time. In addition, across parent and teacher

forms, scores on the Behavioral Symptoms Index (i.e., the composite score on the BASC-2 that

reflects the child’s overall level of problem behavior) demonstrate strong concurrent

Page 63: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

50

relationships (r = .76 - .84) with the Total Problems composite score on the ASEBA Child

Behavior Checklist (Achenbach & Rescorla, 2000). Exploratory factor analyses conducted at the

subscale level also provide evidence of the structural validity of the BASC-2 Teacher and Parent

Rating Scales. Three and four-factor solutions were extracted using Principal Axis factoring and

Varimax rotation, and examined for suitability for both the TRS and PRS. Consistent with

theory, four-factor solutions presented better model fit and were retained for the TRS and PRS

(Reynolds & Kamphaus).

Procedure

Participants received a multidisciplinary Autism Team evaluation by school district

personnel due to concerns regarding their social functioning, communication abilities, and/or

sensorimotor functioning. Each multidisciplinary Autism Team is composed of a Licensed

Specialist in School Psychology, Educational Diagnostician, Speech and Language Pathologist,

and Occupational Therapist, all of whom have specialized training in the assessment and

diagnosis of Autism Spectrum Disorders. In addition to general training on ASDs, all team

members have completed the standardized training on the administration and scoring of the

Autism Diagnostic Observation Schedule-Generic, which was facilitated by an ADOS-certified

trainer. At the completion of the ADOS-G training, all team members were required to reliably

score a video-taped ADOS-G administration to demonstrate their competence with the

assessment tool. No team members participated in ADOS-G administration and/or scoring prior

to completing the required training and demonstrating scoring competence.

As a part of the autism evaluation process, participants were each administered Module 1,

2, or 3 (depending on the participant’s age and expressive language ability) of the ADOS-G by

the multidisciplinary evaluation team. The ADOS-G was administered and scored in accordance

Page 64: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

51

with the standardized procedures set forth by test authors. In addition to the administration of the

ADOS-G, each evaluation also included a number of other assessment activities. Participants’

parents and teachers participated in semi-structured clinical interviews in order for evaluation

personnel to gather information regarding the student’s past and current functioning across

settings. Parents and teachers also completed broadband behavioral rating scales (i.e., BASC-2)

and autism screening/diagnostic measures (i.e., GARS-2). In addition, direct observations of the

student within his or her educational setting were conducted by the evaluation team. Finally,

parents were asked to provide a detailed birth, health, and developmental history regarding their

child. Following the completion of the evaluation process, the multidisciplinary team reviewed

all assessment data and assigned each participant a clinical diagnosis in accordance with the

diagnostic criteria set forth by the Diagnostic and Statistical Manual for Mental Disorders,

Fourth Edition-Text Revision (DSM-IV-TR; American Psychiatric Association, 2004) and a

special education eligibility.

Educational files from all students who participated in multidisciplinary Autism Team

evaluations from January of 2007 through December 2011 were located and reviewed. Next,

demographic information; total scores, domain scores, and/or subscale scores from all

administered standardized assessments; item scores for each item on the ADOS-G,

Communication and Social Interaction domain scores, the Communication + Social Interaction

Total score, and the resulting diagnostic classification obtained from applying the original

scoring algorithm on the ADOS-G; and ending clinical diagnoses and special education

eligibility categories were entered into a database. Participants’ ADOS-G item scores were then

used to “rescore” their performance using the revised ADOS-G scoring algorithm. Participants’

Social Affective and Restricted-Repetitive Behavior domain scores, Social Affective + Restricted-

Page 65: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

52

Repetitive Behavior Total scores, and the resulting diagnostic classification were also entered

into the database.

In order to compare diagnostic classifications of the ADOS-G with clinical diagnoses

made without results from the ADOS-G (Hypothesis 4), Licensed Specialists in School

Psychology and predoctoral psychology interns with formal training in autism assessment

reviewed assessment information from a sample of approximately 100 evaluations randomly

selected from the 462 evaluations included in the analyses of diagnostic accuracy. Evaluators

were provided with all of the assessment information obtained during the multidisciplinary team

evaluations with the exception of the participant’s ADOS-G scores and the evaluation team’s

diagnostic conclusions. Based on the other available information, trained clinicians assigned

each participant with a clinical diagnosis, if appropriate.

Page 66: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

53

Chapter 3. Results

Preliminary Analyses & Testing of Assumptions

ADOS-G item analyses. Items included in both the Original and Revised Scoring

Algorithms on the ADOS-G were examined for normality, linearity, multicollinearity, and the

presence of multivariate outliers by module. Mean item scores, standard deviations, and skew

and kurtosis values are presented in Tables 3-5. Item scores were considered to be skewed and/or

kurtotic if they exceeded +/-2 or +/-7 respectively (Fabrigar, Wegener, MacCallum, & Strahan,

1999). Skew and kurtosis fell within normal limits for the majority of items across all three

modules. Item D-1 on Module 3 was mildly skewed (skew = 2. 15) and Item A-3 on Module 3

was found to be both mildly skewed and kurtotic (skew = 2.86, kurtosis = 7.95). Item D-3 on

Module 1 (skew = 4.23, kurtosis = 17.17), Module 2 (skew = 6.11, kurtosis = 35.91), and

Module 3 (skew = 8.30, kurtosis = 72.38) was found to be moderately to severely skewed and

kurtotic. However, Item A-3 is not included in either the Original or Revised Scoring Algorithm

for Module 3, and Item D-3 is not included in either scoring algorithm across modules. Linearity

of item scores across modules was supported through the visual inspection of scatterplots. Visual

inspection of standard and reproduced correlation matrices confirmed the presence of moderate

to strong correlations between items and the absence of multicollinearity across modules.

Mahalanobis Distance Tests (Tabachnick & Fidell, 1996) were conducted across modules and

scoring algorithms to investigate the presence of multivariate outliers. No outliers were identified

in the Module 1, Original Scoring Algorithm (OSA); Module 1, Revised Scoring Algorithm

(RSA)1; and Module 2-OSA. The presence of two multivariate outliers were identified in the

Module 2-RSA2; and seven outliers were identified in both the Module 3-OSA and RSAs. For

each module with outliers, preliminary factor analyses were conducted with and without these

Page 67: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

54

Table 3

Item Means, Standard Deviations, Skew and Kurtosis Values on Module 1 from the ADOS-G (N = 82) Item M SD Skew Kurtosis

A-1: Overall level of non-echoed language 1.18 .79 -.34 -1.31

A-2: Frequency of vocalizations to othersc 1.49 .72 -1.06 -.27

A-3: Intonation of vocalizations/verbalizations .63 .84 .78 -1.21

A-4: Immediate echolalia .89 .90 .22 -1.76

A-5: Stereotyped use of wordsc .55 .83 1.02 -.77

A-6: Use of other’s body to communicatea .61 .84 .85 -1.05

A-7: Pointingc 1.39 .81 -.84 -.96

A-8: Gesturesc 1.16 .84 -.30 -1.51

B-1: Unusual eye contactc 1.54 .85 -1.27 -.33

B-2: Responsive Social Smile 1.35 .82 -.74 -1.11

B-3: Facial expressions directed to othersc 1.33 .75 -.63 -.97

B-4: Integration of gaze/other behavior during socializationb 1.61 .64 -1.42 .83

B-5: Shared enjoyment in interactionc 1.02 .88 -.05 -1.71

B-6: Response to name 1.32 .83 -.66 -1.23

B-7: Requesting 1.13 .73 -.22 -1.09

B-8: Giving 1.45 .69 -.87 -.43

B-9: Showingc 1.61 .66 -1.46 .85

B-10: Spontaneous initiation of joint attentionc 1.41 .73 -.83 -.66

B-11: Response to joint attentiona 1.22 .80 -.42 -1.32

(table continues)

Page 68: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

55

Table 3 (continued)

Items M SD Skew Kurtosis

B-12: Quality of social overturesc 1.65 .62 -1.56 1.32

D-1: Unusual sensory interest in play materials/personb .83 .84 .34 -1.52

D-2: Hand/finger complex mannerismsb

.62 .83 .81 -1.04

D-3: Self-injurious behavior .10 .40 4.23 17.17

D-4: Repetitive interests/stereotyped behaviorsb

.83 .81 .33 -1.41

Note. Items are scored on a 0 to 2 point scale, where a score of 0 = no impairment and a score of 2 = significant impairment. aItem included in the Original Scoring Algorithm only. bItem included in the Revised Scoring Algorithm only. cItem included in both scoring algorithms.

Page 69: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

56

Table 4

Item Means, Standard Deviations, Skew, and Kurtosis Values on Module 2 from the ADOS-G (N =118) Item M SD Skew Kurtosis

A-1: Overall level of non-echoed language .92 .78 .13 -1.32

A-2: Social overtures/maintenance of attentiona 1.08 .86 -.17 -1.65

A-3: Autism associated speech abnormalities .82 .84 .35 -1.51

A-4: Immediate echolalia .78 .82 .43 -1.38

A-5: Stereotyped use of wordsc .76 .81 .46 -1.33

A-6: Conversationa 1.38 .81 -.81 -.98

A-7: Pointingc .70 .83 .61 -1.28

A-8: Gesturesc .92 .87 .15 -1.67

B-1: Unusual eye contactc 1.25 .97 -.51 -1.75

B-2: Facial expressions directed to othersc .77 .78 .43 -1.23

B-3: Shared enjoyment in interactionb

.60 .79 .84 -.87

B-4: Response to name .64 .79 .73 -1.02

B-5: Showingb

1.03 .77 -.04 -1.29

B-6: Spontaneous initiation of joint attentionc .84 .77 .29 -1.27

B-7: Response to joint attention .66 .81 .70 -1.12

B-8: Quality of social overturesc 1.06 .78 -.10 -1.33

B-9: Quality of social responsea 1.03 .78 -.04 -1.34

B-10: Amount of reciprocal social communicationc 1.24 .86 -.48 -1.50

B-11: Overall quality of rapportc .94 .84 .11 -1.58

(table continues)

Page 70: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

57

Table 4 (continued)

Items M SD Skew Kurtosis

D-1: Unusual sensory interest in play materials/personb .36 .58 1.40 .99

D-2: Hand/finger complex mannerismsb

.35 .61 1.55 1.33

D-3: Self-injurious behavior .03 .16 6.11 35.91

D-4: Repetitive interests/stereotyped behaviorsb

.52 .71 1.01 -.31

Note. Items are scored on a 0 to 2 point scale, where a score of 0 = no impairment and a score of 2 = significant impairment. aItem included in the Original Scoring Algorithm only. bItem included in the Revised Scoring Algorithm only. cItem included in both scoring algorithms.

Page 71: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

58

Table 5

Item Means, Standard Deviations, Skew, and Kurtosis Values on Module 3 from the ADOS-G (N = 262) Items M SD Skew Kurtosis

A-1: Overall level of non-echoed language .38 .60 1.34 .76

A-2: Autism associated speech abnormalities 1.09 .74 -.14 -1.16

A-3: Immediate echolalia .15 .40 2.86 7.95

A-4: Stereotyped use of words/phrasesc .73 .72 .47 -.98

A-5: Offers information .76 .84 .47 -1.42

A-6: Asks for information 1.45 .73 -.93 -.56

A-7: Reporting of eventsc 1.05 .83 -.09 -1.53

A-8: Conversationc 1.34 .77 -.67 -1.01

A-9: Gesturesc .81 .82 .36 -1.43

B-1: Unusual eye contactc 1.25 .97 -.52 -1.74

B-2: Facial expressions directed to othersc .87 .71 .19 -1.02

B-3: Language production/linked nonverbal communication .26 .51 1.78 2.31

B-4: Shared enjoyment in interactionb

.81 .82 .36 -1.43

B-5: Empathy/comments on others’ emotions 1.40 .73 -.77 -.74

B-6: Insighta 1.44 .72 -.89 -.57

B-7: Quality of social overturesc 1.21 .69 -.30 -.88

B-8: Quality of social responsec 1.11 .67 -.13 -.78

B-9: Amount of reciprocal social communicationc 1.18 .79 -.33 -1.32

B-10: Overall quality of rapportc 1.07 .80 -.12 -1.40

(table continues)

Page 72: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

59

Table 5 (continued)

Items M SD Skew Kurtosis

D-1: Unusual sensory interest in play materials/personb .23 .51 2.15 3.79

D-2: Hand/finger complex mannerismsb

.29 .59 1.87 2.34

D-3: Self-injurious behavior .03 .20 8.30 72.38

D-4: Excessive interest in specific topics/repetitive behaviorsb

.66 .80 .70 -1.07

D-5: Compulsions or rituals .28 .57 1.96 2.69

Note. Items are scored on a 0 to 2 point scale, where a score of 0 = no impairment and a score of 2 = significant impairment. aItem included in the Original Scoring Algorithm only. bItem included in the Revised Scoring Algorithm only. cItem included in both scoring algorithms.

Page 73: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

60

outliers and the solutions did not vary significantly. Thus, data from all participants were

included in subsequent analyses.

Total, scale, and subscale score analyses. Table 6 presents the means, standard

deviations, score ranges, skew, and kurtosis values for ADOS-G Total Scores, GARS-2 index

and subscale scores, and subscale scores on the BASC-2. Skew and kurtosis of scores fell within

acceptable ranges for all total/index/subscale scores. Linearity of item scores across modules and

scoring algorithms was supported through visual inspection of scatterplots. Thus, data from all

participants were included in subsequent analyses.

Hypothesis 1: Factor Structure of the Original and Revised Scoring Algorithms

Exploratory Factor Analyses (EFA) were conducted (by module) on the items included in

the Original and Revised Scoring Algorithms of the ADOS-G. Based on the factor structures

outlined by the authors for the OSA (Lord et al., 1999) and RSA (Gotham et al., 2007), it was

expected that the items included in the OSA for each module would reflect a uni-dimensional

structure; whereas the items included in the Revised Scoring Algorithm would reflect a two-

factor structure.

To determine the adequacy of ADOS-G module items for factorability, several steps were

taken. First, the relationships between items were examined by generating a correlation matrix.

According to Tabachnick and Fidell (2007), a factorable correlation matrix should include

several sizable correlations. If the matrix was determined to be adequate, Bartlett’s Test of

Sphericity (Bartlett, 1950) was next conducted to test the null hypothesis that the correlation

matrix is an identity matrix. The Kaiser-Meyer-Olkin test of sampling adequacy (Kaiser, 1974)

was also calculated and examined to further investigate factorability. KMO values > .60 were

accepted as evidence of factorability (Kaiser).

Page 74: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

61

Table 6 Participants’ Means, Standard Deviations, Score Range, Skew, and Kurtosis Values on the ADOS-G, GARS-2, and Selected Subscales from the BASC-2

Scale/Subscale M SD Rangea

Skew Kurtosis

ADOS-G: Original Scoring Algorithms

Total Score, M1 (N = 82) 15.17 6.63 0 - 24 -.83 -.30

Total Score, M2 (N = 118) 11.98 8.07 0 - 24 -.18 -1.39

Total Score, M3 (N = 262) 12.06 6.07 0 - 22 -.27 -.94

ADOS-G: Revised Scoring Algorithms

Total Score, M1 (N = 66)

15.77 7.49 0 - 27 -.63 -.57

Total Score, M2 (N = 73)

11.34 8.05 0 - 27 .09 -1.15

Total Score, M3 (N = 261) 12.59 6.79 0 - 27 -.15 -1.03

GARS-2 Parent Ratings (N = 109)

Autism Index 80.71 18.94 40 - 130 .28 .36

Stereotyped Behaviors 6.73 2.96 1 - 15 .54 -.23

Communication 7.96 3.64 2 - 16 .35 -.84

Social Interaction 6.51 3.13 2 - 16 .65 -.11

GARS 2 Teacher Ratings (N = 112)

Autism Index 85.38 18.86 40 - 132 .06 .23

Stereotyped Behaviors 6.56 2.94 0 - 16 .37 .33

Communication 8.60 3.78 0 - 18 -.05 -.27

Social Interaction 7.62 3.47 0 - 15 .16 -.96

(table continues)

Page 75: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

62

Table 6 (continued) Scale/Subscale M SD Rangea Skew Kurtosis

BASC-2 Parent Ratings (N = 261)

Anxiety 50.73 12.37 28 - 96 .82 .84

Atypicality 67.09 16.34 24 - 120 .53 .35

Withdrawal 63.62 14.80 33 - 120 .51 .53

BASC 2 Teacher Ratings (N = 261)

Anxiety 54.10 14.19 38 - 103 1.20 1.11

Atypicality 73.64 17.55 36 - 120 .29 -.48

Withdrawal 67.97 13.49 38 - 100 .12 -.61

Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic; GARS-2 = Gilliam Autism Rating Scale, Second Edition; BASC-2 = Behavior Assessment System for Children, Second Edition; BRIEF = Behavior Rating Inventory of Executive Function; M = Module. Total scores on the Original Scoring Algorithm were obtained by summing the Communications and Social Interaction total scores; Total scores on the Revised Scoring Algorithm were obtained by summing the Social Affect + Restricted Repetitive Behavior total scores. aObserved score ranges.

Page 76: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

63

Common factor analysis was conducted because the goal of the study is to identify the

latent structure of the ADOS-G items (Wegener & Fabrigar, 2000). Additionally, common factor

analysis has been reported to produce more accurate estimates of population parameters than

Principal Components Analysis (Widamann, 1993). Principal axis extraction was utilized

because it is less likely to be affected by multivariate nonnormality than other extraction

methods, such as maximum likelihood extraction (Briggs & MacCallum, 2003). Communalities

were initially estimated using squared multiple correlations, and the initial number of factors to

retain for rotation was based on theory, visual inspection of the scree test (Catell, 1966), parallel

analysis (Horn, 1965), and minimum average partials (MAP; Velicer, 1976). Because factors are

assumed to be correlated, a Promax rotation with a k value of 4 was selected (Tataryn, Wood, &

Gorsuch, 1999). The final selection of factor structure was determined on (a) salient

pattern/structure coefficients greater than or equal to .32; (b) a minimum of three items with

salient loadings factor; (c) simple structure (i.e., items loaded saliently on a single factor only;

Thurstone, 1947); (d) resulting scale reliability estimates greater than or equal to .70; and (e)

theoretical convergence.

Module 1 - Original Scoring Algorithm (OSA). Data from the 12 items included in the

Module 1-OSA were submitted for common factor analysis (Principal Axis Factoring

extraction). Bartlett’s Test of Sphericity (χ2 = 577.723, df = 66, p < .001) and the Kaiser-Meyer-

Olkin statistic (.899) indicated that the correlation matrix was adequate for factorability. In

addition, the correlation matrix (presented in Appendix C) of the aforementioned items was

reviewed and determined to contain several correlations above .30 (Tabachnick & Fidell, 2007).

Therefore, all reviewed statistics suggested that the correlation matrix was appropriate for factor

analysis.

Page 77: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

64

MAP criteria, parallel analysis, and visual inspection of the scree plot recommended the

retention of one factor. Therefore, a one-factor solution was extracted and examined. The

resulting solution (Table 7) was adequate based on the standards set a priori. Eleven items loaded

saliently on the one-factor solution, with structure coefficients ranging from .35 to .89 (Mdn =

.73) and communalities ranging from .13 to .78 (Mdn = .56). The one-factor solution accounted

for 52 percent of the total variance between the items and was robust across extraction

(Unweighted Least Squares) methods.

Because one item (A-5) did not load on the one-factor solution and research has indicated

that over-factoring is better than under-factoring (Wood, Tataryn, & Gorsuch, 1996), a two-

factor solution was extracted, rotated (Promax rotation), and examined for adequacy. However,

simple structure was not observed in the two-factor solution. Specifically, there were two items

that saliently loaded on both factors, and three items did not load on any factor. Thus, the two-

factor solution was rejected.

Examination of the items that saliently loaded on the factor indicates that they each

measure an aspect of verbal or non-verbal attempts at initiating or sustaining social

communication. Thus, this factor was labeled Social Communication. The reliability estimate

(Cronbach’s α) of the scores on the Social Communication factor was .90, and with the exception

of Item A-5, all of the corrected inter-item correlations for each item on the scale were greater

than or equal to .34, with the majority of the correlations falling above .60. Further, item-total

statistics (see Appendix D) indicate that, with the exception of Item A-5, all of the items are

adding to the overall scale reliability and that deleting any of the items would not improve the

overall scale reliability.

Page 78: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

65

Table 7 Structure Coefficients and Communalities for the ADOS-G Module 1(Original Scoring Algorithm) Items (N = 82) Item Structure Coefficient Communality

A-2: Frequency of vocalizations to others .892 .795

A-5: Stereotyped use of words .086 .007

A-6: Use of other’s body to communicate .354 .126

A-7: Pointing .749 .561

A-8: Gestures .654 .427

B-1: Unusual eye contact .732 .536

B-3: Facial expressions directed to others .823 .677

B-5: Shared enjoyment in interaction .727 .528

B-9: Showing .780 .609

B-10: Spontaneous initiation of joint attention .755 .570

B-11: Response to joint attention .621 .386

B-12: Quality of social overtures .831 .690

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.

Page 79: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

66

A follow-up analysis (Appendix E) was conducted to determine the suitability of the one-

factor solution when Item A-5 (i.e., the item that did not load saliently on the one-factor solution)

was deleted. All items loaded saliently on the solution, and it accounted for 58 percent of total

variance of Module 1 OSA items. In addition, the reliability estimate (Cronbach’s Alpha) for the

updated scale equaled .91.

Module 1 - Revised Scoring Algorithm (RSA). Data from the 14 items of the Module

1-RSA were also submitted for common factor analysis (Principal Axis Factoring extraction).

Bartlett’s Test of Sphericity (χ2 = 532.557, df = 91, p < .001), Kaiser-Meyer-Olkin statistic

(.887), and the item correlation matrix (see Appendix C) all suggested that the correlation matrix

was adequate for factorability. MAP criteria, parallel analysis, and visual inspection of the scree

plot recommended the retention of one factor; however, the theoretical rationale reported by the

test authors (Gotham et al., 2007) specified the retention of two factors. Thus, solutions

containing one- and two-factors were examined.

The two-factor solution is presented in Table 8. Each of the 14 items loaded saliently and

singularly on the two-factor solution, and it accounted for 60 percent of the total variance. Ten

items were salient on Factor 1, with pattern coefficients ranging from .45 to .92 (Mdn = .75), and

four items were salient on Factor 2, with pattern coefficients ranging from .41 to .73 (Mdn =

.66).Communalities ranged from .17 to .76 (Mdn = .54), and the factor intercorrelation was .58.

The two-factor solution was robust across extraction (Unweighted Least Squares) and rotation

(Direct Oblimin) methods. Reliability estimates (Cronbach’s α) were .93 and .70 for Factor 1 and

Factor 2, respectively.

The one-factor solution also was examined for suitability (Table 9). Twelve of the

fourteen items loaded saliently on the one-factor solution, with structure coefficients ranging

Page 80: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

67

Table 8 Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 1(Revised Scoring Algorithm) Items (N = 66) Pattern Coefficients

Structure Coefficients

Item Factor 1

Factor 2 Factor 1 Factor 2 Communality

A-2: Frequency of vocalizations/verbalizations to others .917 -.077 .872 .452 .764

A-5: Stereotyped use of words/phrases .000 .413 .238 .413 .170

A-7: Pointing .688 .036 .709 .433 .503

A-8: Gestures .448 .300 .621 .558 .445

B-1: Unusual eye contact .646 .169 .743 .541 .571

B-3: Facial expressions directed to others .799 .081 .846 .543 .721

B-4: Integration of gaze/other social behav. in social overtures .868 -.173 .768 .327 .609

B-5: Shared enjoyment in interactions .537 .218 .663 .528 .471

B-9: Showing .807 -.038 .785 .428 .617

B-10: Spontaneous initiation of joint attention .750 -.027 .734 .406 .540

B-12: Quality of social overtures .864 -.058 .830 .441 .692

(table continues)

Page 81: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

68

Table 8 (continued)

Pattern Coefficients

Structure Coefficients

Item

Factor 1

Factor 2 Factor 1 Factor 2 Communality

D-1: Unusual sensory interests in person/objects .017 .731 .439 .741 .549

D-2: Hand/finger mannerisms -.144 .663 .239 .580 .350

D-4: Repetitive interests/stereotyped behaviors .070 .649 .445 .690 .479

Note. Table presents the extraction of a two-factor solution using Principal Axis Extraction and Promax Rotation. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 1, No Words Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only

Page 82: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

69

Table 9

Structure Coefficients and Communalities for the ADOS-G Module 1(Revised Scoring Algorithm) Items (N = 66) Item Structure Coefficient Communality

A-2: Frequency of vocalizations/verbalizations to others .846 .716

A-5: Stereotyped use of words/phrases .288 .083

A-7: Pointing .706 .498

A-8: Gestures .654 .427

B-1: Unusual eye contact .758 .575

B-3: Facial expressions directed to others .848 .720

B-4: Integration of gaze/social behav. in social overtures .727 .528

B-5: Shared enjoyment in interactions .686 .470

B-9: Showing .768 .590

B-10: Spontaneous initiation of joint attention .721 .519

B-12: Quality of social overtures .809 .655

D-1: Unusual sensory interests in person/objects .512 .262

D-2: Hand/finger mannerisms .314 .099

D-4: Repetitive interests/stereotyped behaviors .512 .262

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 1, No Words Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only.

Page 83: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

70

from .51 to .85 (Mdn = .73) and communalities from .08 to .72 (Mdn = .50). The one-factor

solution accounted for 49% of the total variance between the items and was robust across

extraction (Unweighted Least Squares) methods.

Despite these findings, the two-factor solution was retained for interpretation instead of

the one-factor solution because it demonstrated theoretical convergence, higher communalities

and factor loadings, and accounted for a larger percent of total variance. In addition, every item

loaded saliently and singularly (i.e., loaded on one factor only) on the two-factor solution,

whereas two items did not load on the one-factor solution.

As a reflection of the items that saliently loaded on each factor, Factor 1 was labeled

Social Communication (SC) and Factor 2 was labeled Stereotyped/Repetitive Behaviors (SRB).

On the SC scale, all of the corrected inter-item correlations for each item (see Appendix D) were

greater than.61, with the majority of the correlations falling above .70, and all of the items on the

scale added to the overall scale reliability. Corrected inter-item correlations for each item on the

RRB scale exceeded .33. Item-total statistics report that three of the four items are adding to

overall scale reliability. One item (A-5), if deleted would provide a very modest increase in

overall scale reliability (i.e., from .70 to .71).

Module 2 – Original Scoring Algorithm. Common factor analysis (Principal Axis

Factoring extraction) also was conducted on the 12 items included in the Module 2-OSA.

Bartlett’s Test of Sphericity (χ2 = 1214.032, df = 66, p < .001) and the KMO statistic (.934) were

once again reviewed to determine the adequacy of the ADOS-G Module 2-OSA items for

factorability. In addition, the item correlation matrix (Appendix C) contained primarily moderate

to strong correlations between items, suggesting that the correlation matrix was appropriate for

factor analysis.

Page 84: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

71

MAP criteria, parallel analysis, visual inspection of the scree plot, and the authors’

theoretical rationale each recommended retention of one factor. Therefore, a one-factor solution

was extracted and examined. The resulting solution (Table 10) was examined for suitability and

determined to be adequate based on the standards set a priori. Each of the 12 items loaded

saliently on the one-factor solution, with structure coefficients ranging from .64 to .85 (Mdn =

.83) and communalities ranging from .41 to .79 (Mdn = .69). The one-factor solution accounted

for 66 percent of the total variance and was robust across extraction (Unweighted Least Squares)

methods.

Consistent with the Module 1-OSA, examination of the items that saliently loaded on the

factor indicates that they each measure an aspect of verbal or nonverbal communication. Thus,

this factor was labeled Social Communication. The scale reliability estimate for the Social

Communication factor (Cronbach’s α = .95), and the corrected inter-item correlation for each

item on the scale was greater than or equal to .62, with the majority of the correlations falling

above .77. Further, item-total statistics (see Appendix D) indicate that all of the items on the

scale are adding to the overall scale reliability and that deleting any of the items would not

improve the overall scale reliability.

Module 2 - Revised Scoring Algorithm. Data from the 14 items of the Module 2-RSA

submitted for common factor analysis (Principal Axis Factoring extraction) also were determined

to be adequate for factorability based on the Bartlett’s Test of Sphericity (χ2 = 711.141, df = 91, p

< .001), KMO statistic (.889), and review of the inter-item correlation matrix (Appendix C).

MAP criteria, parallel analysis, and visual inspection of the scree plot recommended the

retention of one factor, whereas the theoretical rationale reported by the test authors (Gotham et

al., 2007) supported the retention of two factors. Thus, solutions containing one and two factors

Page 85: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

72

Table 10 Structure Coefficients and Communalities for the ADOS-G Module 2 (Original Scoring Algorithm) Items (N = 118) Item Structure Coefficient Communality

A-2: Social overtures/maintenance of attention .848 .720

A-5: Stereotyped use of words .684 .468

A-6: Conversation .828 .685

A-7: Pointing .707 .499

A-8: Gestures .810 .656

B-1: Unusual eye contact .705 .496

B-2: Facial expressions directed to others .791 .626

B-6: Spontaneous initiation of joint attention .640 .409

B-8: Quality of social overtures .887 .787

B-9: Quality of social response .866 .749

B-10: Amount of reciprocal social communication .889 .790

B-11: Overall quality of rapport .845 .714

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.

Page 86: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

73

were examined. Although neither the two-factor nor one-factor solution resulted in optimal

model fit, both solutions met a priori criteria set for factor retention. Table 11 presents the pattern

coefficients, structure coefficients, and communalities for the two-factor solution. Structure

coefficients and communalities for the one-factor solution are also presented in Table 12.

Each of the 14 items loaded saliently on the two-factor solution. However, one item (A-8)

demonstrated salient loadings on both extracted factors. In addition, the Factor 1 pattern

coefficient for Item B-5 was slightly greater than one. The two-factor solution accounted for 64

percent of the total variance, and communalities ranged from .26 to .76 (Mdn = .61). Eleven

items were salient on Factor 1, with pattern coefficients ranging from .47 to 1.01 (Mdn = .73),

and four items were salient on Factor 2, with pattern coefficients ranging from .38 to .63 (Mdn =

.63). The factor intercorrelation was .66. The two-factor solution was robust across extraction

(Unweighted Least Squares) and rotation (Direct Oblimin) methods. Reliability estimates

(Cronbach’s α) =.94 and .70 for Factor 1 and Factor 2, respectively. On Factor 1, all of the

corrected inter-item correlations for each item were greater than or equal to .64, with the

majority of the correlations falling above .70, and all of the items on the scale are adding to the

overall scale reliability (Appendix D). On Factor 2, the corrected inter-item correlation for each

item was greater than or equal to .29. However, item-total statistics report that only three of the

four items are adding to overall scale reliability: one item (D-2), if deleted, would provide a

modest increase in overall scale reliability (i.e., from .70 to .74).

The one factor solution was also examined for suitability. Thirteen of the fourteen items

loaded saliently on the one-factor solution, with salient structure coefficients ranging from .44 to

.85 (Mdn = .78) and communalities ranging from .04 to .72 (Mdn = .61). The one-factor solution

accounted for 54 percent of the total variance between scale items and was robust across

Page 87: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

74

Table 11 Pattern Coefficients, Structure Coefficients, and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73) Pattern Coefficients

Structure Coefficients

Item Factor 1

Factor 2 Factor 1 Factor 2 Communality

A-5: Stereotyped use of words .471 .271 .650 .582 .464

A-7: Pointing .727 -.008 .722 .471 .521

A-8: Gestures .514 .375 .761 .714 .659

B-1: Unusual eye contact .502 .202 .635 .533 .426

B-2: Facial expressions directed to others .590 .263 .763 .651 .621

B-3: Shared enjoyment in interactions .642 .187 .765 .610 .605

B-5: Showing 1.006 -.243 .846 .420 .749

B-6: Spontaneous initiation of joint attention .910 -.282 .725 .318 .570

B-8: Quality of social overtures .920 -.077 .870 .530 .760

B-10: Amount of reciprocal social communication .870 -.016 .859 .557 .738

B-11: Overall quality of rapport .781 .090 .840 .604 .710

(table continues)

Page 88: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

75

Table 11 (continued)

Pattern Coefficients

Structure Coefficients

Item Factor 1

Factor 2 Factor 1 Factor 2 Communality

D-1: Unusual sensory interests in person/play materials -.026 .630 .389 .613 .376

D-2: Hand/finger complex mannerisms -.250 .578 .131 .413 .206

D-4:Repetitive interests/stereotyped behaviors .184 .627 .597 .748 .578

Note. Table presents the extraction of a two-factor solution using Principal Axis Extraction and Promax Rotation. Salient pattern coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 2, Less Than 5 Years Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only.

Page 89: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

76

Table 12 Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items (N = 73) Item Structure Coefficient Communality

A-5: Stereotyped use of words .674 .454

A-7: Pointing .714 .510

A-8: Gestures .792 .628

B-1: Unusual eye contact .653 .426

B-2: Facial expressions directed to others .785 .616

B-3: Shared enjoyment in interactions .780 .608

B-5: Showing .801 .642

B-6: Spontaneous initiation of joint attention .687 .460

B-8: Quality of social overtures .851 .724

B-10: Amount of reciprocal social communication .849 .721

B-11: Overall quality of rapport .844 .712

D-1: Unusual sensory interests in person/play materials .443 .196

D-2: Hand/finger complex mannerisms .187 .035

D-4:Repetitive interests/stereotyped behaviors .645 .416

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 2, Less Than 5 Years Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only.

Page 90: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

77

extraction (Unweighted Least Squares) methods. The reliability estimate (Cronbach’s α) of the

scores on the factor was .93, and, with the exception of Item D-2, all of the corrected inter-item

correlations for each item on the scale exceeded .43, with the majority of the correlations falling

above .76. Further, item-total statistics (Appendix D) indicate that, with the exception of Item D-

2, all of the items on the scale are adding to the overall scale reliability and that deleting any of

the items would not improve overall reliability.

After examining the two solutions, the one-factor solution was retained for interpretation

instead of the two-factor solution due to the presence of a Heywood Case (i.e., a factor loading

greater than one; Costello & Osborne, 2005), lack of item singularity, failure to adhere to

theoretical rationale, and minimally acceptable scale reliability for one of the two extracted

factors. Examination of the items that saliently loaded on the factor indicates that they measure

aspects of social functioning, communication, and engagement in stereotyped repetitive

behaviors. Thus, the factor was labeled Autistic Characteristics.

A follow-up EFA (Appendix E) was conducted to determine the suitability of the one-

factor solution when Item D-2 (i.e., the item that did not load saliently on the one-factor solution)

was deleted. All items loaded saliently on the solution, and it accounted for 58 percent of total

variance of Module 2 RSA items. Cronbach’s Alpha = .94 for the updated scale.

Module 3 - Original Scoring Algorithm. The correlation matrix of the 11 items was

also adequate for factorability (Bartlett’s Test of Sphericity [χ2 = 1504.436, df = 55, p < .001)];

KMO statistic = .933; multiple inter-item correlations > .30 [Tabachnick & Fidell, 2007; see

Appendix C]). All relevant criteria recommended the retention of a one-factor solution.

Therefore, a one-factor solution was extracted (Table 13), examined, and was determined to be

adequate based on the standards set a priori. Ten items loaded saliently on the one-factor

Page 91: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

78

Table 13

Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items (N = 262) Item Structure Coefficient Communality

A-4: Stereotyped use of words/phrases .304 .092

A-7: Reporting of events .621 .368

A-8: Conversation .795 .631

A-9: Gestures .647 .418

B-1: Unusual eye contact .555 .308

B-2: Facial expressions directed to others .752 .566

B-6: Shared enjoyment in interactions .651 .424

B-7: Quality of social overtures .833 .694

B-8: Quality of social response .819 .672

B-9: Amount of reciprocal social communication .806 .649

B-10: Overall quality of rapport .763 .582

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.

Page 92: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

79

solution, with salient structure coefficients ranging from .56 to .81 (Mdn = .75) and

communalities ranging from .09 to .69 (Mdn = .57). The one-factor solution accounted for 53

percent of the total variance and was robust across extraction (Unweighted Least Squares)

methods.

Because one item (A-4) did not load on the one-factor solution, a two-factor solution was

examined for adequacy. However, the two-factor solution was rejected because it did not

demonstrate simple structure (i.e., there were four items that saliently loaded on both factors and

each factor did not contain at least three items with salient pattern coefficients).

The retained factor was labeled Social Communication to reflect the content of the items

that saliently loaded on the factor. The reliability estimate (Cronbach’s α) of the Social

Communication scale was .90, and with the exception of item A-4, all of the corrected inter-item

correlations were greater than or equal to .34, with the majority of the correlations falling above

.55. Further, item-total statistics (Appendix D) indicate that, with the exception of item A-4, all

of the items on the scale are adding to the overall scale reliability.

The ten items with salient loadings were resubmitted for a follow-up EFA. The suitability

of the one-factor solution was confirmed (see Appendix E). Each of the items also demonstrated

salient loadings on the new solution, which accounted for 58 percent of the variance between

items. The reliability estimate (Cronbach’s Alpha) of the updated scale equaled .92.

Module 3 - Revised Scoring Algorithm. Data from the 14 items of the Module 3-RSA,

also were submitted for common factor analysis using Principal Axis Factoring extraction. The

adequacy of the correlation matrix for factorability was established by the Bartlett’s Test of

Sphericity (χ2 = 1680.750, df = 91, p < .001), KMO statistic (.926), and a review of the inter-item

correlation matrix (see Appendix C). MAP criteria, parallel analysis, and the visual inspection of

Page 93: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

80

a scree plot recommended the retention of one factor, where as the theoretical rationale reported

by the test authors (Gotham et al., 2007) supported the retention of two factors. Thus, solutions

containing one and two factors were extracted and examined.

A two-factor solution was unable to be extracted within the 25 iterations allowed by

SPSS and, therefore, could not be considered. The one-factor solution (Table 14) was

determined to be adequate based on the standards set a priori. Twelve items loaded saliently on

the one-factor solution, with salient structure coefficients ranging from .56 to .81 (Mdn = .78)

and communalities ranging from .02 to .68 (Mdn = .57). Subsequent analyses determined that the

one-factor solution was robust across extraction (Unweighted Least Squares) methods.

The one-factor solution accounted for 45 percent of the total variance between the

Module 3-RSA items, and the reliability estimate (Cronbach’s α) of the scores on the factor was

.89. With the exception of items D-1 and D-2, corrected inter-item correlations for each item on

the scale were greater than or equal to .33, with the majority of the correlations falling above .70.

Item-total statistics (Appendix D) report that 12 of the 14 items (i.e., all items except for D1 and

D2) are adding to the overall scale reliability. In addition, deletion of items D1 and D2 from the

scale would increase scale reliability. Salient items were reviewed for content and determined to

measure aspects of social functioning, communication, and engagement in stereotyped repetitive

behaviors. Thus, the factor was labeled Autistic Characteristics.

A follow-up EFA (Appendix E) was conducted to determine the suitability of the one-

factor solution with the deletion of Items D-1 and D-2. All items loaded saliently on the new

solution, and it accounted for 51 percent of total variance of Module 3 RSA items. Cronbach’s

Alpha = .90 for the updated scale.

Page 94: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

81

Table 14 Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items (N = 261) Item Structure Coefficient Communality

A-4: Stereotyped use of words/phrases .327 .107

A-7: Reporting of events .616 .380

A-8: Conversation .781 .610

A-9: Gestures .662 .438

B-1: Unusual eye contact .548 .300

B-2: Facial expressions directed to others .774 .598

B-4: Shared enjoyment in interaction .824 .678

B-7: Quality of social overtures .810 .656

B-8: Quality of social response .806 .650

B-9: Amount of reciprocal social communication .758 .575

B-10: Overall quality of rapport .755 .570

D-1: Unusual sensory interest in play materials/person .236 .056

D-2: Hand/finger complex mannerisms .136 .019

D-4: Excessive interest in specific topics/repetitive behav .395 .156

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.

Page 95: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

82

Hypothesis 2: Relationships between Scores on the ADOS-G and Other Measures

Bivariate correlations (Pearson’s r) were calculated to determine the strength of the

relationships between participants’ total scores on the ADOS-G and parent and teacher ratings of

participants’ behavioral functioning on the GARS-2 (Autism Index and subscale scores), and on

select subscales of the BASC-2. Because the scores generated from the ADOS-G and behavior

rating scales do not use a common metric (e.g., standard scores or T scores), z scores were

calculated for each of the variables to allow for appropriate comparisons. Z scores for parent and

teacher ratings on the GARS-2 and BASC-2, were then correlated with ADOS-G total score z

scores across module and scoring algorithm. Correlations were calculated using the ADOS-G

total scores based on the OSA (Lord et al., 1999), RSA (Gotham et al., 2007), and Updated

Scoring Algorithm (i.e., the scoring algorithms identified from the factor analyses conducted in

the current sample with recommended item deletions for Module 1-OSA, Module 2-RSA, and

Module 3-OSA and RSA). Table 15 presents the validity coefficients representing the

relationships between participants’ total scores on Modules 3 of the ADOS-G across scoring

algorithms, and parent and teacher ratings of participants’ behavior on the GARS-2. Table 16

presents the validity coefficients representing the relationships between participants’ total scores

on the ADOS-G across modules and scoring algorithms, and parent and teacher ratings’ of

participants’ behavior on select subscales of the BASC-2. Validity coefficients were interpreted

as follows: r < .30 = weak correlation, .30 < r < .59 = moderate correlation, r > .60 = strong

correlation (Cicchetti, 1994).

Module 1. Consistent with hypotheses, results of the correlational analysis indicate that

weak relationships exist between participants’ total scores on Module 1 of the ADOS-G, across

scoring algorithms, and parent and teacher ratings’ of participants behavior on the BASC-2

Page 96: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

83

Table 15 Pearson Correlations between Participants’ Total Scores on the ADOS-G Original and Revised Scoring Algorithms for Module 3 and Parent and Teacher Ratings on the GARS-2 Module 3-OSA

Module 3-RSA

Scale/Subscale Authors’ FS Identified FS Authors’ FS Identified FS

GARS-2: Parent Ratings (N = 72)

Autism Index -.15 .02 -.20 .03

Stereotyped Behaviors -.30 .06 -.34 .05

Communication -.15 -.06 -.19 .02

Social Interaction -.08 -.01 -.12 -.05

GARS-2: Teacher Ratings (N = 70)

Autism Index .08 .10 .12 .06

Stereotyped Behaviors .04 .09 .08 .04

Communication .15 .16 .18 .14

Social Interaction .08 .01 .11 -.03

Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. FS = Factor Structure. GARS-2 = Gilliam Autism Rating Scale, Second Edition. *p < .05. **p < .01.

Page 97: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

84

Table 16

Pearson Correlations between Participants’ Total Scores on the ADOS-G Original, Revised, and Updated Scoring Algorithms and Parent and Teacher Ratings on the BASC-2 BASC-2 Subscale

Anxiety Atypicality Withdrawal

ADOS-G Module/Scoring Algorithm PRS TRS PRS TRS PRS TRS

Module 1 - Authors’ Algorithms

Original Scoring Algorithm -.24 -.25 .43** .13 .40** .23

Revised Scoring Algorithm -.23 -.21 .52** .13 .47** .20

Module 1 – Updated Algorithms

Original Scoring Algorithm -.12 -.32 .07 .05 .04 .01

Revised Scoring Algorithm N/A N/A N/A N/A N/A N/A

Module 2 - Authors’ Algorithms

Original Scoring Algorithm -.05 .06 .16 .37** .22 .19

Revised Scoring Algorithm -.27 .09 .14 .35* .21 .16

Module 2 – Updated Algorithms

Original Scoring Algorithm N/A N/A N/A N/A N/A N/A

Revised Scoring Algorithm .24 .10 .08 -.04 .16 -.04

Module 3 - Authors’ Algorithms

Original Scoring Algorithm -.15 -.18* -.02 .14 .09 -.09

Revised Scoring Algorithm -.14 -.21** 0 -.14 .12 -.07

Module 3 – Updated Algorithms

Original Scoring Algorithm .01 .07 .05 -.04 .16 .03

(table continues)

Page 98: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

85

Table 16 (continued) BASC-2 Subscale

Anxiety Atypicality Withdrawal

ADOS-G Module/Scoring Algorithm PRS TRS PRS TRS PRS TRS

Module 3 – Updated Algorithms

Revised Scoring Algorithm .03 .07 .01 -.05 .13 .01

Note. Sample size varied by module, scoring algorithm, and rater. N’s are as follows: Module 1-OSA PRS = 47, TRS = 45; Module 1-RSA PRS = 42, TRS = 32; Module 2-OSA PRS = 73, TRS = 79; Module 2-RSA PRS = 68, TRS = 67; Module 3-OSA and RSA PRS = 145, TRS = 148. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. Lord et al. (1999) authored the Original Scoring Algorithm, and Gotham et al. (2007) authored the Revised Scoring Algorithm. Updates to the scoring algorithms for Module 1-RSA and Module 2-OSA were not needed based on data obtained from the EFAs conducted on these modules. ADOS-G = Autism Diagnostic Observation Schedule-Generic; BASC-2 = Behavior Assessment System for Children, Second Edition. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from correlational analyses. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only. In addition, references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. *p < .05. **p < .01.

Page 99: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

86

Anxiety subscale; and moderate relationships exist between total scores on the Module 1-OSA

and RSA and parent ratings on the BASC-2 Atypical and Withdrawal subscales. However,

inconsistent with hypotheses, weak relationships were measured between teacher ratings on the

BASC-2 Atypical and Withdrawal subscales and participants’ total scores on the Module 1-OSA

and RSA.

Correlations also were calculated between parent and teacher ratings on the BASC-2 and

total scores for the Module 1-Updated OSA. Consistent with hypotheses, results indicate that

weak relationships exist between participants total scores on the ADOS-G and parent ratings on

the BASC-2 Anxiety subscale. However, inconsistent with hypotheses, weak relationships were

also measured between parent and teacher ratings on the BASC-2 Atypicality and Withdrawal

subscales and total scores on the ADOS-G; and a negative moderate relationship was measured

between ADOS-G total scores and teacher ratings on the BASC-2 Anxiety subscale.

Module 2. Consistent with hypotheses, results of the correlational analysis indicate that

weak relationships exist between participants’ total scores on Module 2 of the ADOS-G, across

scoring algorithms, and parent and teacher ratings’ of participants behavior on the BASC-2

Anxiety subscale; and moderate relationships exist between total scores on the Module 2-OSA

and RSA and teacher ratings on the BASC-2 Atypical subscale. Inconsistent with hypotheses,

weak relationships were also measured between parent and teacher ratings on the BASC-2

Withdrawal subscale and ADOS-G total score across scoring algorithms; and between total

scores on the Module 2-OSA and RSA and parent ratings on the BASC-2 Atypicality subscale.

Correlations were also calculated between parent and teacher ratings on the BASC-2 and

total scores for the Module 2-RSA using the Module 2 Updated RSA (no changes were

recommended for the Module 2-OSA). Weak relationships were measured between parent and

Page 100: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

87

teacher ratings on the BASC-2 Anxiety subscale and ADOS-G total scores, which is consistent

with hypotheses. However, inconsistent with hypotheses, weak relationships were also measured

between parent and teacher ratings on the BASC-2 Atypical and Withdrawal subscales and

ADOS-G total scores.

Module 3. On the GARS-2, inconsistent with hypotheses, weak negative relationships

were measured between participants’ total scores on the ADOS-G OSA and RSA, and parent

ratings on the Communication and Social Interactions subscales. Although moderate correlations

were measured between ADOS-G scores, across scoring algorithms, and parent ratings of

participants’ behavior on the Stereotyped Behaviors subscale, these relationships were also

negative, which is inconsistent with theoretical expectations. Parent ratings across the three

subscales resulted in weak negative relationships between total scores on the GARS-2 (i.e., the

Autism Index) and participants’ total scores on the Module 3-OSA and RSA. Consistently weak

correlations were measured between teacher ratings across the subscales and Autism Index on

the GARS-2 and ADOS-G total scores across scoring algorithms. This result is also inconsistent

with hypotheses.

On the BASC-2, weak correlations were measured between total scores on the Module 3-

OSA and RSA and parent and teacher ratings on the Anxiety subscale, which was consistent with

predictions. However, inconsistent with predictions, weak relationships were also measured

between ADOS-G total scores, across algorithms, and parent and teacher ratings on the BASC-2

Atypicality and Withdrawal subscales.

Correlations were also calculated between parent and teacher ratings on the GARS-2 and

BASC-2, and total scores for the Module 3-OSA and RSA using the Updated Scoring

Algorithms. Inconsistent with hypotheses, weak relationships were measured between Module 3

Page 101: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

88

Updated OSA and RSA total scores, and parent and teacher ratings on the GARS-2, across all

subscales and the Autism Index. Weak relationships were also measured between parent and

teacher ratings on the BASC-2, across subscales, and Module 3 Updated OSA and RSA total

scores. Although the weak relationships measured on the BASC-2 Anxiety subscale was

consistent with hypotheses, those on the BASC-2 Atypicality and Withdrawal subscales were

inconsistent with hypotheses.

Hypothesis 3: Comparisons of Diagnostic Accuracy Indicators across Scoring Algorithms

The purpose of Hypothesis 3 is to compare the diagnostic accuracy of scores obtained

with the OSA and RSA. It was hypothesized that participants’ scores on the RSA would result in

greater diagnostic accuracy than those on the OSA.

Receiver Operating Characteristic (ROC) curve analysis was conducted across modules,

scoring algorithms, and ADOS-G classification determinations (i.e., ASD vs. No Spectrum

Disorder, and Non-Autism ASD vs. Autistic Disorder) to determine the sensitivity (i.e., the

percentage of individuals that have a clinical diagnosis of Autistic Disorder/ASD that accurately

score above the Autistic Disorder/ASD cut-scores on the ADOS-G) and specificity (i.e., the

percentage of individuals without a clinical diagnosis of Autistic Disorder/ASD that accurately

score below the cut-scores for Autistic Disorder/ASD on the ADOS-G) of ADOS-G diagnostic

classifications. ROC plots portray sensitivity and specificity levels for a measure, which are

determined by examining the area under the curve (AUC; Strik, Honig, Lousberg, & Denollet,

2001). Simon (1999) suggested the following interpretation of AUC values: 0.50 to 0.75 (Fair);

0.75 to 0.92 (Good); 0.92 to 0.97 (Very Good); 0.97 to 1.00 (Excellent). Values are compared to

a null hypothesis of a “true area” equivalent to 0.50. Thus, AUC significance indicates that

sensitivity and specificity values statistically differ from random assignment. Positive predictive

Page 102: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

89

power (i.e., the percentage of individual scoring above the Autistic Disorder/ASD cut-scores on

the ADOS-G that also have a clinical diagnosis of Autistic Disorder/ASD), negative predictive

power (i.e., the percentage of individuals scoring below the Autistic Disorder/ASD cut-scores

that do not have a clinical diagnosis of Autistic Disorder/ASD), and the hit rate (proportion of

accurate positive and negative classification) also were calculated for each module across scoring

algorithms and classification comparisons.

Results for comparisons of diagnostic accuracy across the OSA and RSA are first

presented. Comparisons of diagnostic accuracy using the Updated Scoring Algorithms are then

reported.

Original and Revised Scoring Algorithm Comparisons. Indicators of diagnostic

accuracy (i.e., ROC plot AUC values, specificity, sensitivity, positive predictive values, negative

predictive values, and hit rates) for participants’ total scores on the ADOS-G for Module 1,

Module 2, and Module 3 across scoring algorithms, using the OSA and RSA, are presented in

Table 17.

ASD vs. no spectrum disorder comparisons. AUC values for each of the modules across

scoring algorithms are greater than or equal to .50, suggesting that the sensitivity and specificity

values obtained are not simply the result of random assignment. Based on Simon’s (1999)

interpretation criteria, the overall diagnostic accuracy of the Module 1-OSA is Fair, whereas the

overall diagnostic accuracy of the Module 1-OSA, Module 2-OSA and RSA, and Module 3-OSA

and RSA is Good. AUC values are higher for the RSA than the OSA for Modules 1 and 3, and

higher for the OSA than the RSA for Module 2.

Substantial differences were not observed between the sensitivity, specificity, positive

predictive values, negative predictive values, and hit rates obtained from applying the Original

Page 103: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

90

Table 17 AUC Values, Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Original and Revised Scoring Algorithms AUC Values

Sensitivity Specificity PPV NPV

Hit Rate

OSA RSA OSA RSA OSA RSA OSA RSA OSA RSA OSA RSA

Autism Spectrum Disorder vs. No Spectrum Disorder (N = 400)

Module 1 .733 .849 1.00 1.00 .75 .75 .92 .93 1.00 1.00 .93 .94

Module 2 .839 .806 .89 .89 .62 .69 .78 .81 .78 .80 .78 .81

Module 3 .787 .799 .96 .91 .44 .49 .77 .78 .85 .78 .79 .77

Non-Autism ASDa vs. Autistic Disorder (N = 248)

Module 1 .671 .332 .90 N/Ab .17 N/Ab .65 N/Ab .50 N/Ab .63 N/Ab

Module 2 .674 .697 .95 1.00 .05 0 .49 .49 .50 0 .49 .49

Module 3 .675 .690 .96 .98 .23 .09 .33 .31 .93 .91 .44 .35

Note. Sens = sensitivity. Spec = specificity. PPV = positive predictive value. NPV = negative predictive value. HR = hit rate. AUC = Area under the curve. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. ASD = Autism Spectrum Disorder. Autism Spectrum Disorders include diagnoses of Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified (PDD NOS), and Asperger’s Disorder. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from the analyses of diagnostic accuracy. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only. In addition, references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. aNon-Autism ASD = PDD NOS and Asperger’s Disorder. bValues were not calculated because the AUC value indicated that the Module 1-RSA is not more effective than chance at differentiating between Non-Autism ASDs and Autistic Disorder.

Page 104: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

91

and the Revised Scoring Algorithms for each of the three modules. Sensitivity values remained

consistent across scoring algorithms for Modules 1 and 2, and decreased slightly for Module 3

when the RSA was applied. Specificity values also remained consistent across scoring

algorithms for Module 1, and increased slightly for Modules 2 and 3 with the RSA. Positive

predictive values using the RSA were also slightly higher for each of the three modules than

those obtained from the OSA. Further, application of the RSA resulted in higher negative

predictive values and hit rates than those obtained from the OSA for Modules 1 and 2. However,

the negative predicative values and hit rates obtained from applying the OSA to Module 3 were

higher than those obtained from the RSA.

Non-Autism ASD vs. Autistic Disorder comparisons. AUC values for the Module 1-

OSA and for the OSA and RSA for Modules 2 and 3 are greater than .50, suggesting that the

sensitivity and specificity values obtained are not simply the result of random assignment.

However, the AUC value for the Module 1-RSA is less than .50, which suggests that scores from

this algorithm are not accurately differentiating diagnosis better than would be expected by

chance. Based on Simon’s (1999) interpretation criteria, the overall diagnostic accuracy of the

Module 1-OSA, Module 2-OSA and RSA, and Module 3-OSA and RSA are Fair. Because of the

inadequate AUC value, further comparisons were not made between the differential diagnostic

accuracy of the Module 1-OSA and RSA. For Modules 2 and 3, the AUC values are higher for

the RSA than the OSA.

Use of the RSA, as compared to the OSA, results in higher levels of sensitivity and

negative predictive values across Modules 2 and 3. However, the RSA also consistently results in

lower specificity values across modules. In addition, positive predictive values and hit rates are

equivalent across algorithms for Module 2, and lower with the RSA in Module 3.

Page 105: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

92

Updated Scoring Algorithms and Optimal Cut-Score Comparisons. ROC analyses

were also conducted to determine the overall diagnostic accuracy of ADOS-G total scores using

the Updated Scoring Algorithms (i.e., the scoring algorithms based on the item structure

identified from the factor analyses conducted in the current sample with recommended item

deletions for Module 1-OSA, Module 2-RSA, and Module 3-OSA and RSA) and to identify

appropriate cut-scores for the Updated Scoring Algorithms. In addition, ROC plots were

reviewed for the retained algorithms (i.e., Module 1-RSA and Module 2-OSA retained consistent

with authors’ recommendations) to identify optimal cut-scores (i.e., those maximizing sensitivity

and specificity). AUC values for the Updated Scoring Algorithms and optimal cut-scores for the

Updated and Retained Scoring Algorithms are presented in Table 18. Table 19 presents the

specificity, sensitivity, positive predictive values, negative predictive values, and hit rates of

ADOS-G scores from the Updated and Retained Scoring Algorithms.

AUC values for each of the Updated Scoring Algorithms across ADOS-G classification

comparisons (i.e., ASD vs. No Spectrum Disorder and Non-Autism ASD vs. Autistic Disorder)

are greater than .50, suggesting that the Updated Algorithms are better than chance at

differentiating participants. Based on Simon’s (1999) interpretation criteria, when differentiating

participants with ASDs from those without, the Updated Algorithms demonstrate Fair (Module 1

Updated OSA) to Good (Module 2 Updated RSA and Module 3 Updated OSA and RSA)

diagnostic accuracy. When used for diagnosis differentiation between participants with Non-

Autism ASD’s and those with Autistic Disorder, the Update Algorithms are demonstrating Fair

overall diagnostic accuracy.

Sensitivity values for each of the Updated and Retained Scoring Algorithms exceed .75

and were determined to be adequate based on the standards recommended (i.e., sensitivity > .70)

Page 106: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

93

Table 18

Updated Algorithm AUC Values and Optimal Cut-Scores for the ADOS-G Updated and Retained Scoring Algorithms

Updated Algorithm AUC Values

Optimal Cut-Scores

OSA RSA

OSA RSA

ASD vs. No Spectrum Disorder (N = 400)

Module 1 .720b N/A 6b 7c

Module 2 N/A .814b 9c 9b

Module 3 .773b .799b 8b 8b

Non-Autism ASDa vs. Autistic Disorder (N = 248)

Module 1 .683b N/A 15b N/Aa

Module 2 N/A .700b 14c 12b

Module 3 .661 .689 12b 13b

Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic; AUC = Area under the curve value. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from the analyses of diagnostic accuracy. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only. In addition, references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. aScores were not calculated for these conditions because the AUC value indicated that the sensitivity and specificity values were not significantly different from the Null Hypothesis. bValues obtained for the Updated Scoring Algorithm. cValues are from retained scoring algorithms (i.e., consistent with authors’ [Lord et. al, 1999; Gotham et al., 2007] recommendations).

Page 107: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

94

Table 19

Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores on the Updated and Retained Scoring Algorithms Sensitivity Specificity PPV NPV

Hit Rate

OSA RSA OSA RSA OSA RSA OSA RSA OSA RSA

Autism Spectrum Disorder vs. No Spectrum Disorder (N = 400)

Module 1 1.00b 1.00c .52b .75c .84b .93c 1.00b 1.00c .87b .94c

Module 2 .80c .89b .66c .77b .80c .85b .79c .82b .79c .84b

Module 3 .87b .88b .53b .64b .79b .83b .67b .74b .76b .80b

Non-Autism ASDa vs. Autistic Disorder (N = 248)

Module 1 .76b N/Ad .44b N/Ad .79b N/Ad .44b N/Ad .66b N/Ad

Module 2 .84c .94b .40c .19b .57c .54b .73c .75b .62c .56b

Module 3 .81b .89b .33b .33b .35b .37b .79b .88b .48b .50b

Note. PPV = positive predictive value. NPV = negative predictive value. HR = hit rate. OSA = Original Scoring Algorithm. RSA = Revised Scoring Algorithm. Due to an insufficient sample size, the Module 1, No Words and Module 2, Less Than 5 Years Revised Scoring Algorithms were excluded from the analyses of diagnostic accuracy. References to the Module 1 Revised Scoring Algorithm refer to the Module 1, Some Words Revised Scoring Algorithm only, and references to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only. aNon-Autism ASD = PDD NOS and Asperger’s Disorder. bValues obtained for the Updated Scoring Algorithm. cValues are from retained scoring algorithms (i.e., consistent with authors’ [Lord et. al, 1999; Gotham et al., 2007] recommendations) using optimal cut-scores. d Values were not calculated because the AUC value indicated that the Module 1-RSA is not more effective than chance at differentiating between Non-Autism ASDs and Autistic Disorder.

Page 108: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

95

by Matthey & Petrovski (2002). However, specificity values are consistently lower than

recommended (i.e., specificity > .80; Matthey & Petrovski).

Hypothesis 4: Diagnostic Accuracy of Independent Clinical Diagnoses

Measures of diagnostic accuracy (i.e., sensitivity, specificity, positive predictive values,

negative predictive values, and hit rates) were also calculated to determine the overall diagnostic

accuracy of ADOS-G scores (obtained using both the Original Scoring Algorithm, and the

Updated OSA and Retained OSA with optimal cut-score) when compared to practitioners’

clinical diagnoses made with and without results from participants’ performance on the ADOS-

G. It was hypothesized that greater diagnostic accuracy of ADOS-G scores would be observed

when scores are compared to clinical diagnoses made with knowledge of participants’ ADOS-G

classification (i.e., No Spectrum Disorder, Non-Autism ASD, or Autistic Disorder) as compared

to those made without information regarding ADOS-G performance.

Inter-rater agreement between end dichotomous diagnostic classifications (i.e., the

participant does or does not appear to exhibit an Autism Spectrum Disorder [ASD], including

Autistic Disorder) and differential diagnosis determinations (Non-Autism ASD vs. Autistic

Disorder) initially were calculated using kappa coefficients for the 100 participants included in

the analyses. Excellent inter-rater agreement (k = .77; Cicchetti, 1994) was demonstrated

between the diagnostic classifications of initial clinicians and independent reviewers; whereas

fair (k = .49) inter-rater agreement was demonstrated between differential diagnosis

determinations.

Table 20 presents indicators of diagnostic accuracy for ADOS-G scores from the Original

Scoring Algorithm against clinical diagnoses made with and without participants’ classification

determinations on the ADOS-G. In general, inconsistent with hypotheses, ADOS-G scores

Page 109: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

96

Table 20 Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores From the Original Scoring Algorithm Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100) Sensitivity Specificity PPV NPV

Hit Rate

With W/O With W/O With W/O With W/O With W/O

Autism Spectrum Disorder vs. No Spectrum Disorder (N = 100)

Module 1 1.00 1.00 .75 .60 .92 .83 1.00 1.00 .93 .87

Module 2 1.00 1.00 .77 .77 .82 .82 1.00 1.00 .89 .89

Module 3 .88 .94 .46 .54 .70 .74 .73 .87 .71 .78

Non-Autism ASDa vs. Autistic Disorder (N = 58)

Module 1 1.00 1.00 0 0 .64 .90 0 0 .64 .90

Module 2 1.00 1.00 0 0 .50 .57 0 0 .50 .57

Module 3 1.00 1.00 .33 .27 .27 .24 1.00 1.00 .47 .41

Note. With = Clinical diagnosis made with knowledge of the participants’ performance on the ADOS-G. W/O = Clinical diagnosis made without knowledge of the participants’ performance on the ADOS-G. PPV = positive predictive value. NPV = negative predictive value. ASD = Autism Spectrum Disorder. Autism Spectrum Disorders include diagnoses of Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified (PDD NOS), and Asperger’s Disorder. aNon-Autism ASD = PDD NOS and Asperger’s Disorder.

Page 110: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

97

across modules and classification comparisons (i.e., ASD vs. No Spectrum Disorder and Non-

Autism ASD vs. Autistic Disorder demonstrate similar levels of diagnostic accuracy when

compared to clinical diagnoses made with and without information regarding ADOS-G

performance.

Table 21 presents indicators of diagnostic accuracy for ADOS-G scores from the Updated

and Retained Scoring Algorithms again clinical diagnoses made with and without participants’

classification determinations on the ADOS-G. Consistent with expectations, variability is

observed between indicators of diagnostic accuracy across clinical diagnoses made with and

without ADOS-G performance. However, inconsistent with predictions, diagnoses made with

information regarding participants’ performance on the ADOS-G do not consistently result in

better diagnostic accuracy than do those diagnoses made without ADOS-G information.

Page 111: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

98

Table 21 Sensitivities, Specificities, Positive Predictive Values, Negative Predictive Values, and Hit Rates of ADOS-G Scores From the Updated and Retained Original Scoring Algorithms Compared to Clinical Diagnoses Made With and Without the ADOS-G (N = 100) Sensitivity Specificity PPV NPV

Hit Rate

With W/O With W/O With W/O With W/O With W/O

Autism Spectrum Disorder vs. No Spectrum Disorder (N = 100)

Module 1b 1.00 1.00 .92 .60 .92 .83 1.00 1.00 .93 .87

Module 2c 1.00 1.00 .77 .77 .82 .82 1.00 1.00 .89 .89

Module 3b .71 .82 .58 .75 .71 .82 .58 .75 .66 .79

Non-Autism ASDa vs. Autistic Disorder (N = 58)

Module 1b .29 .44 .25 1.00 .40 1.00 .17 .17 .27 .45

Module 2c 1.00 .88 .43 .17 .64 .58 1.00 .50 .71 .57

Module 3b .50 1.00 .64 .55 .21 .38 .70 1.00 .42 .64

Note. With = Clinical diagnosis made with knowledge of the participants’ performance on the ADOS-G. W/O = Clinical diagnosis made without knowledge of the participants’ performance on the ADOS-G. PPV = positive predictive value. NPV = negative predictive value. ASD = Autism Spectrum Disorder. Autism Spectrum Disorders include diagnoses of Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified (PDD NOS), and Asperger’s Disorder. aNon-Autism ASD = PDD NOS and Asperger’s Disorder. bValues obtained for the Updated Scoring Algorithm. cValues are from Retained Scoring Algorithm (i.e., consistent with authors’ [Lord et. al, 1999; Gotham et al., 2007] recommendations) using optimal cut-scores.

Page 112: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

99

Chapter 4. Discussion

Although currently considered a “gold-standard” (Kline-Tasman, Risi, & Lord, 2007) in

the diagnostic assessment of Autism and widely used across clinical and educational settings,

few independent studies have been conducted to date regarding the psychometric properties of

the Autism Diagnostic Observation Schedule-Generic (ADOS-G; Lord et al., 1999). As such, the

purpose of this study was to examine several lines of validity evidence (internal structure,

relationships with other variables, and diagnostic accuracy) for scores obtained from the ADOS-

G.

Hypothesis 1 predicted that, across modules, items included in the Original Scoring

Algorithm would reflect a uni-dimensional construct, whereas items included in the Revised

Scoring Algorithm would reflect two constructs across modules. Exploratory Factor Analysis

(EFA) was conducted to examine the structural validity of ADOS-G Modules 1, 2, and 3 using

both the Original Scoring Algorithm (OSA) and the Revised Scoring Algorithm. Hypothesis 1

was supported for the OSA across modules, but was not supported for the RSA across modules.

Hypothesis 2 predicted that scores on the ADOS-G would demonstrate moderate to

strong relationships with scores from other measures of autistic behavior and weaker

relationships with other measures of behavioral functioning. Correlational analyses were

conducted to examine the relations between participants’ total scores on the ADOS-G, across

scoring algorithms, with parent and teacher ratings of participants’ behavior on the GARS-2 and

the on select subscales of the BASC-2. Hypothesis 2 was not consistently supported across

modules and scoring algorithms.

Hypothesis 3 predicted that, across modules, scores obtained from the Revised Scoring

Algorithm would demonstrate greater diagnostic accuracy than scores obtained from the Original

Scoring Algorithm. AUC values and indicators of diagnostic accuracy were reviewed across

Page 113: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

100

modules and scoring algorithms to determine if differences exist between the Original and the

Revised Scoring Algorithms. Hypothesis 3 was partially supported across modules.

Hypothesis 4 predicted that greater diagnostic accuracy of ADOS-G scores would be

observed when scores were compared to clinical diagnoses made with participants’ results on the

ADOS-G as compared to those made without ADOS-G information. Hypothesis 4 was not

consistently supported across modules.

Structural Validity Evidence

Module 1. EFA confirmed a one-factor structure for items included within the OSA.

Items that loaded saliently on the one-factor solution primarily assess aspects of nonverbal social

communication (e.g., use of gesturing, eye contact, and the directing facial expressions towards

others), which examinees may or may not pair with vocalizations. Only one item (A-2) distinctly

reflects a participant’s attempts at verbal communication. Based on item content, the extracted

factor was labeled Social Communication. Item A-5 did not load on the one-factor solution. In

addition, item-total statistics indicate that the same item is modestly detracting from overall scale

reliability for Module 1. As such, this item should not be included in the OSA for Module 1. The

resulting 11-item scale has been entitled the Module 1 Updated OSA.

Because Lord et al. (1999) did not provide specific information regarding the

factorability of ADOS-G items in the original sample, it is unclear if a similar pattern was

observed by the authors. However, Lord et al. reported that “almost all items” (p.116) loaded on

the one-factor solution for each module, suggesting that one or more items also failed to load

saliently on the extracted factors in the original sample.

A two-factor solution was retained for the RSA. Examination of salient item loadings on

each of the factors in the two-factor solution revealed that the factors obtained in this study

Page 114: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

101

retained similar items as the factors obtained by Gotham et al. (2007). Items retained on Factor 1

are similar to the items included in the Module 1-OSA and assess aspects of verbal and

nonverbal social communication (Social Communication factor); whereas the items retained on

Factor 2 reflect the participants’ engagement in stereotyped and repetitive behaviors (e.g.,

stereotyped use of words and hand/finger mannerisms; Stereotyped/Repetitive Behavior [SRB]

factor).

Module 2. EFA confirmed a one-factor structure for items included within the OSA.

Items that loaded saliently on the Module 2-OSA one-factor solution assess aspects of nonverbal

(e.g., use of gesturing, pointing, eye contact, and the direction of facial expressions towards

others) and verbal (i.e., engagement in reciprocal social communication and conversation) social

communication. Thus, the extracted factor was also labeled Social Communication. All items

loaded saliently on the retained factor and are contributing to the overall scale reliability, which

suggests that each item should be retained in the scale.

A one-factor solution also was retained for the Module 2-RSA. Retained items assess a

combination of social, communication, and stereotyped repetitive behaviors. However, all items

reflect aspects of the Autistic Disorder diagnostic criteria set forth by the DSM-IV-TR

(American Psychiatric Association, 2004). As such, the factor was labeled Autistic

Characteristics. Although inconsistent with hypotheses and Gotham et al.’s (2007) results,

current results are consistent with the results of Gotham et al.’s (2008) reexamination of the

RSA, in which the authors questioned the suitability of a two-factor solution for the Module 2

RSA.

One item (Item D-2, which assesses hand/finger mannerisms) did not load saliently on

the one-factor solution for the Module 2-RSA. Because item-total statistics also suggest that the

Page 115: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

102

inclusion of Item D-2 in the factor solution is resulting in a mild decrease in overall scale

reliability, it is recommended that the item be excluded from the Module 2-RSA. The resulting

13-item scale (Module 2 Updated RSA) was used to test subsequent hypotheses.

Module 3. EFA confirmed a one-factor structure for items included within the OSA. The

items that loaded saliently on the one-factor solution primarily assess aspects of nonverbal social

communication (e.g., use of gesturing, eye contact, and the directing facial expressions towards

others) and verbal social communication (e.g., reporting of events, conversation, reciprocal

social interactions). As such, the extracted factor was labeled Social Communication. Item A-4

did not load on the one-factor solution. In addition, item-total statistics indicated that the same

item is modestly detracting from overall scale reliability for Module 3. As such, this item should

not be included in the OSA for Module 3. The resulting 10-item scale was entitled the Module 3

Updated OSA.

A one-factor solution also was retained for the Module 3-RSA.A review of the items

retained on the one-factor solution indicates that the retained items assess a combination of

social, communication, and stereotyped repetitive behaviors, and all items reflect aspects of the

Autistic Disorder diagnostic criteria set forth by the DSM-IV-TR (American Psychiatric

Association, 2004). As such, the factor was labeled Autistic Characteristics.

Two items (Items D-1 and D-2) did not load saliently on the one-factor solution for the

Module 3-RSA, and item-total statistics indicate that the deletion of these items from the scale

would increase scale reliability. Based on this information, items D-1 and D-2 should be

removed from the Module 3-RSA. The resulting scale (Module 3 Updated RSA was used to test

subsequent hypotheses.

Page 116: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

103

Convergent and Discriminant Validity Evidence

To examine evidence of the convergent and discriminant validity of ADOS-G scores,

participants’ total scores obtained on the ADOS-G, using both the OSA and RSA, were

correlated with parent and teacher ratings of participants’ behavior on the GARS-2 and BASC-2.

Relationships also were measured between total scores on the Updated Scoring Algorithms and

respondents’ ratings on the GARS-2 (for Module 3) and the BASC-2. Inconsistent with

expectations, moderate to strong relationships were not consistently observed between ADOS-G

scores (OSA or RSA) and other measures of autistic behavior. Use of the Updated Scoring

Algorithms did not yield stronger relationships. In fact, although some moderate correlations

were observed between parent and teacher ratings on the Atypicality and Withdrawal subscales

on the BASC-2 and total scores on the Module 1 and 2 OSA and RSA, only weak relationships

were observed using the Updated Scoring Algorithms.

The ADOS-G is a very unique instrument and, to date, is the only direct assessment of

Autism Spectrum Disorders in wide-spread use. As such, it is difficult to obtain appropriate

instruments against which to consistently compare the ADOS-G in order to obtain evidence of

convergent validity. Convenience selections used for this study (i.e., the GARS-2 and Atypicality

and Withdrawal subscales from the BASC-2) have some evidence to support their use as

measures of autistic functioning (Gilliam, 2006; Reynolds & Kamphaus, 2004). However, scores

from the ADOS-G across scoring algorithms and modules, did not consistently demonstrate

moderate to strong relationships with scores on the GARS-2 or the BASC-2. This inconsistency

could exist for several possible reasons. First, based on the questionable evidence of the

structural validity of the GARS-2 (Pandolfi et al., 2010), the GARS-2 subscales and the resulting

Autism Index, may not be consistently measuring the intended constructs. In addition, the small

Page 117: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

104

sample size available for the Module 1 and 2 comparisons may be resulting in inaccurate

demonstrated relationships.

Another, and perhaps more, plausible explanation for inconsistencies between ADOS-G

total scores and parent and teacher ratings on the GARS-2 and BASC-2, however, is related to

differences between the instruments. The GARS-2 and BASC-2 are behavioral rating scales and

ratings are based on parents’ and teachers’ perceptions of a child’s typical behavioral

functioning. In contrast, the ADOS-G is a standardized direct assessment of behavior and is

scored by a trained observer based on a participant’s engagement or lack of engagement in

specific behaviors observed only during the ADOS-G administration. Although each reportedly

assesses autistic behavior, differences in scores may be related to differences in raters’

knowledge and awareness of autistic behaviors and/or the differences in length of opportunity in

which to observe behaviors. Further, parent and teacher ratings on the GARS-2 and BASC-2

may be influenced by desire for a specific outcome (i.e., over-reporting concerns due to a desire

for special education eligibility, or under-reporting concerns, due to the undesirability of a

special education eligibility), whereas scores on the ADOS-G are theoretically objective in

nature. However, total scores on the ADOS-G may also be influenced by examiner bias.

Particularly in the case of re-evaluations, in which examiners are aware of an examinee’s current

ASD diagnosis, administrators may be biased towards or against “observing” the behavioral

characteristics under investigation.

Consistently weak relationships observed between parent and teacher ratings on the

Anxiety subscale of the BASC-2 and participants’ total scores on the ADOS-G across modules

and all scoring algorithms, provide evidence of the discriminant validity of scores on the ADOS-

G.

Page 118: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

105

Evidence of Diagnostic Accuracy

Module 1. To compare diagnostic accuracy across the OSA (Lord et al., 1999) and RSA

(Gotham et. al, 2007), AUC values were first interpreted. When making determinations between

ASDs and No Spectrum Disorder, AUC values indicate that the RSA results in greater overall

diagnostic accuracy than the OSA. When making determinations between Non-Autism ASDs

and Autistic Disorder, failure to reject the Null Hypothesis for the RSA indicates that scores

from the RSA are not accurately differentiating diagnoses better than would be expected by

chance, which suggests that it’s use is uninformative. However, similar concerns were not

observed for the OSA. Thus, the RSA is not resulting in better differential diagnostic accuracy

for Module 1 than the OSA.

In addition to the AUC values, specific indicators of diagnostic accuracy were compared

across scoring algorithms and were measured to be virtually identical across the OSA and RSA

when differentiating between all Autism Spectrum Disorders (ASD) and No Spectrum Disorder

for Module 1. Because results of the ROC analysis indicate that the RSA is not differentiating

participants with Non-Autism ASDs from those with Autistic Disorder beyond chance, further

comparative interpretations were not conducted.

The diagnostic accuracy of the Module 1 Updated OSA also was examined and compared

to that of the standard Module 1-OSA. The AUC value for the Module 1 Updated OSA is

slightly lower than the AUC value for the standard Module 1-OSA when differentiating between

ASDs and No Spectrum Disorders, suggesting that the Module 1 Updated OSA demonstrates

slightly lower overall diagnostic accuracy for this condition. However, the AUC value for the

Updated Algorithm suggests that the Updated OSA demonstrates slightly higher overall

diagnostic accuracy than that of the standard OSA for this condition.

Page 119: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

106

Other specific indicators of diagnostic accuracy also were reviewed. When

differentiating between participants with an ASD and those with No Spectrum Disorder, the

Module 1 Updated OSA demonstrates consistent levels of sensitivity and NPV, but lower

specificity, PPV, and hit rates than the standard OSA. When differentially diagnosing

participants with Non-Autism ASD’s from those with Autistic Disorder, the Module 1 Updated

OSA demonstrates moderately lower levels of sensitivity and NPV, but moderately higher levels

of specificity, PPV, and hit rates than the standard Module 1-OSA. In fact, the Updated OSA for

this condition presents a better balance between sensitivity and specificity than does the standard

OSA.

Differences between the Retained Module 1-RSA (i.e., updated by applying optimal cut-

scores to maximize sensitivity and specificity) and the standard Module 1-RSA were also made

for the ASD vs. No Spectrum Disorder condition. Sensitivities, specificities, PPVs, NPVS, and

hit rates are virtually identical across the two scoring algorithms, which suggests that there are no

significant difference between the Retained and the standard Module 1-RSA.

Indicators of diagnostic accuracy also were reviewed to determine if values obtained

from the current sample using the OSA and RSA are consistent with the sensitivity and

specificity values originally obtained by the authors of the OSA (Lord et al., 1999;) and the RSA

(Gotham et al., 2007). Sensitivity values for the Module 1 OSA from the current sample are

relatively consistent (see Table E2 in the Appendix) with those reported by Lord et al. (1999)

and adequate for diagnostic tests (Matthey & Petrovski, 2002). Specificity values for Module 1

from the current sample, however, are substantially lower than those reported by Lord et al. and

inadequate for diagnostic tests (Matthey & Petrovski). Consistent with the pattern of results for

the OSA, sensitivity values (see Table E3 in Appendix E) obtained from the current sample for

Page 120: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

107

the RSA are adequate for diagnostic tests and slightly higher than those reported by Gotham et

al. However, also consistent with the data obtained for the OSA, specificity values obtained for

the current sample are inadequate for diagnostic tests and substantially lower than those reported

by test authors.

Module 2. AUC values also were interpreted for Module 2 across the OSA, RSA, and

classification determinations (i.e., ASD vs. No Spectrum Disorder, Non-Autism ASD vs.

Autistic Disorder). Based on the AUC values, scores from the OSA result in slightly better

diagnostic accuracy than scores from the RSA when differentiating participants with ASDs from

those with No Spectrum Disorder. However, scores from the RSA are resulting in slightly better

diagnostic accuracy than are scores from the OSA when differentiating participants with Non-

Autism ASDs from those with Autistic Disorder.

Specific indicators of diagnostic accuracy also were compared across the OSA and RSA

for Module 2. The specificity value, PPV, NPV, and hit rates are slightly higher for the RSA than

the OSA when differentiating between Autism Spectrum Disorders (ASD) and No Spectrum

Disorder. However, when engaging in differential diagnosis, use of the RSA results in a higher

level of sensitivity, consistent PPVs and hit rates, and lower levels of specificity and NPVs than

the OSA.

The diagnostic accuracy of the Module 2 Updated RSA also was examined and compared

to that of the standard Module 2-RSA. The AUC value for the Module 2 Updated RSA is slightly

higher than the AUC value for the standard RSA when differentiating between ASDs and No

Spectrum Disorders, and Non-Autism ASDs and Autistic Disorder, suggesting that the Module 2

Updated RSA demonstrates slightly higher overall diagnostic accuracy. When differentiating

between participants with an ASD and those with No Spectrum Disorder, the Module 2 Updated

Page 121: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

108

RSA demonstrates consistent levels of sensitivity and higher levels of specificity, PPV, NPV,

and hit rates than the standard RSA. When differentially diagnosing participants with Non-

Autism ASD’s from those with Autistic Disorder, the Module 2 Updated RSA demonstrates a

slightly lower level of sensitivity, but modest to moderate improvements in specificity, PPV,

NPV, and hit rates than the standard Module 2-RSA.

Comparisons also were made between the Retained Module 2-OSA (i.e., updated by

applying optimal cut-scores to maximize sensitivity and specificity) and the standard Module 2-

OSA. For the ASD vs. No Spectrum Disorder condition, the Retained OSA exhibits a slightly

lower level of sensitivity but slightly higher levels of specificity, PPVs, NPVS, and hit rates than

does the standard Module 2-OSA. When differentially diagnosing participants with Non-Autism

ASDs from those with Autistic Disorder, the Retained Module 2-OSA also demonstrates a

moderate decrease in sensitivity but moderate increases in specificity, PPV, NPV, and hit rates as

compared to the standard OSA. In general, the Retained Module 2-OSA for this condition

presents a better balance between sensitivity and specificity than does the standard OSA.

Indicators of diagnostic accuracy for Module 2 also were reviewed to determine if values

obtained from the current sample using the OSA and RSA are consistent with the sensitivity and

specificity values originally obtained by the authors of the OSA (Lord et al., 1999) and the RSA

(Gotham et al., 2007). Sensitivity values for the Module 2 - OSA from the current sample (see

Table E2) are adequate for diagnostic tests (Matthey & Petrovski, 2002) and relatively consistent

with those reported by the test authors (Lord et al.). However specificity values for Module 2

from the current sample are substantially lower than those reported by test authors (Lord et al.)

and inadequate for diagnostic tests (Matthey & Petrovski). Consistent with the pattern of results

for the OSA, sensitivity values (Table E3) obtained from the current sample for the RSA are

Page 122: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

109

adequate for diagnostic tests and consistent with those reported by Gotham et al. However, also

consistent with the data obtained for the OSA, specificity values obtained for the current sample

are inadequate for diagnostic tests and substantially lower than those reported by test authors

(Gotham et al.)

Module 3. AUC values were also interpreted for Module 3 across the OSA and RSA and

diagnostic comparisons (i.e., ASD vs. No Spectrum Disorder, Non-Autism ASD vs. Autistic

Disorder). Based on the AUC values, scores from the RSA are resulting in slightly better

diagnostic accuracy than are scores from the OSA across comparisons.

Indicators of diagnostic accuracy for Module 3 also were compared across the OSA and

RSA. When differentiating between an Autism Spectrum Disorder (ASD) and No Spectrum

Disorder, use of the RSA results in modest increases in specificity and PPV, but modest

decreases in sensitivity, NPV, and hit rate. Similarly, when engaging in differential diagnosis

(i.e., differentiating between Non-Autism ADSs and Autistic Disorder) use of the RSA results in

modest increases in sensitivity but also modest decreases in PPY, NPV, and hit rate, and

moderate decreases in specificity as compared to the OSA.

The diagnostic accuracy of the Module 3 Updated OSA and RSA were also examined

and compared to that of the standard Module 3-OSA and RSA. The AUC values for the Module

3 Updated OSA are slightly lower than the AUC value for the standard OSA when

differentiating between ASDs and No Spectrum Disorders and Non-Autism ASDs and Autistic

Disorder, suggesting that the standard Module 3-OSA demonstrates slightly higher overall

diagnostic accuracy. The AUC value for the Module 3 Updated RSA was consistent with that of

the standard RSA when differentiating between individuals with ASDs from No Spectrum

Page 123: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

110

Disorders, and slightly lower than that of the standard RSA when differentially diagnosing

participants with Non-Autism ASDs and those with Autistic Disorder.

When differentiating between participants with an ASD and those with No Spectrum

Disorder, the Updated Module 3-OSA and RSA both demonstrate modest decreases in sensitivity

and NPVs, but modest to moderate increases in specificity and PPV when compared to the

standard OSA and RSA. When differentially diagnosing participants with Non-Autism ASD’s

from those with Autistic Disorder, the Updated Module 3-OSA and RSA again demonstrate a

mildly to moderately lower level of sensitivity and NPVs than the standard OSA and RSA, but

modest to moderate improvements in specificity, PPV, and hit rates than the standard Module 3-

OSA and RSA.

Indicators of diagnostic accuracy for the standard OSA and RSA Module 3 were

reviewed to determine if values obtained from the current sample are consistent with the

sensitivity and specificity values originally obtained by the authors of the OSA and the RSA.

Sensitivity values for the Module 3- OSA from the current sample are adequate for diagnostic

tests (Matthey & Petrovski, 2002) and consistently higher than those reported by the test authors

(Lord et al.). However specificity values for the standard Module 3-OSA from the current sample

are substantially lower than those reported by test authors (Lord et al.) and determined to be

inadequate for diagnostic tests (Matthey & Petrovski). Consistent with the pattern of results for

the OSA, sensitivity values (Table E3) obtained from the current sample for the RSA are

adequate for diagnostic tests and higher those reported by Gotham et al. However, also consistent

with the data obtained for the OSA, specificity values obtained for the current sample are

inadequate for diagnostic tests and substantially lower than those reported by test authors

(Gotham et al.).

Page 124: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

111

Independent Clinical Diagnoses

A limitation consistently identified by many of the previous examinations of the

diagnostic accuracy of ADOS-G scores (de Bildt et al., 2009; Gotham et al., 2007; Gray et al.,

2008) is that the determination of diagnostic accuracy (comparing a participant’s ADOS-G

classification to their resulting clinical diagnosis) often has been confounded by the fact that the

two classifications were not independent because participants’ performance on the ADOS-G

was included as part of the data used to make the clinical diagnosis. In order to address this

limitation, indicators of diagnostic accuracy were calculated to determine the overall diagnostic

accuracy of ADOS-G scores when compared to practitioners’ clinical diagnoses made without

knowledge of participants’ performance on the ADOS-G. It was hypothesized that greater

diagnostic accuracy of ADOS-G scores would be observed when scores are compared to clinical

diagnoses made with participants’ performance information on the ADOS-G as compared to

those made without information regarding ADOS-G performance.

Diagnostic accuracy was calculated between clinical diagnoses made with and without

the ADOS-G and total scores on the ADOS-G obtained from applying the Module 1-OSA,

Module 2-OSA, and Module 3-OSA. A second set of comparisons were made between clinical

diagnoses made with and without ADOS-G information and total scores on the ADOS-G

obtained from applying the Module 1 Updated OSA, the Retained Module 2-OSA (i.e., updated

based on optimal cut-scores designed to maximize the balance between sensitivity and

specificity), and the Module 3 Updated OSA. Comparisons were not made for the Revised

Scoring Algorithm or Updated RSA due to sample restrictions.

In general, consistency is observed between the diagnostic accuracy of decisions made

with and without participants’ ADOS-G scores on the Module 1, 2, and 3-OSAs. Sensitivity

Page 125: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

112

values are identical for the ASD vs. No Spectrum Disorder comparisons for Modules 1 and 2 -

OSA, and across all modules for the Non-Autism ASD vs. Autistic Disorders comparisons.

Across diagnostic comparisons, the following identical specificity values are also observed:

Module 2-OSA (ASD vs. No Spectrum Disorder), and Modules 1 and 2-OSA (Non-Autism ASD

vs. Autistic Disorder). Where differences between sensitivity and specificity do exist, clinical

diagnoses made with information from the ADOS-G did not consistently demonstrate higher

levels of diagnostic accuracy.

Across the Module 1 Updated OSA, Retained Module 2 OSA, and Module 3 Updated

OSA indicators of diagnostic accuracy, less consistency was observed. However, contrary to the

hypothesis, diagnostic accuracy was not consistently higher for decisions made with knowledge

of ADOS-G performance. For example, the Module 3 Updated OSA demonstrated higher levels

of sensitivity and consistent or higher levels of specificity for decisions made without ADOS-G

data across classification determinations (i.e., ASD vs. No Spectrum Disorder, and Non-Autism

ASD vs. Autistic Disorder).

In general, these results provide initial evidence that use of a participant’s performance

on the ADOS-G in clinical decisions making does not substantially “over inflate” the reported

diagnostic accuracy of the instrument and, instead, suggest that the methods used in previous

studies likely yielded valid estimates of the diagnostic accuracy of ADOS-G scores.

Summary of Evidence by Module and Scoring Algorithm

Module 1. Results from the current study provide evidence in support of the structural

validity of the Module 1-OSA; however, exploratory factor analysis recommended the deletion

of one item that did not load saliently on the one-factor solution and that was decreasing overall

scale reliability. Based on this recommendation, the Module 1 Updated OSA was created and

Page 126: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

113

considered in subsequent analyses. Correlational analyses indicated that total scores on the

Module 1-OSA and Module 1 Updated OSA did not consistently demonstrate moderate to strong

relationships with other measures of autistic behavior. In fact, results from the Module 1 Updated

OSA were less consistent with predictions than were results from the standard OSA. Total scores

from both the standard and the Updated algorithms consistently demonstrated weak relationships

with measures of other behavioral functioning, providing evidence of the discriminant validity of

ADOS-G scores.

Examinations of diagnostic accuracy evidence suggested that the overall diagnostic

accuracy of the standard Module 1-OSA is Fair. Although sensitivity values were high across

comparisons (i.e. differentiating participants with ASDs from those with No Spectrum Disorders,

and those with Non-Autism ASDs from those with Autistic Disorder), specificity values were

also consistently lower than recommended standards. Use of the Module 1 Updated OSA did not

improve the diagnostic accuracy of the Module 1-OSA for determining individuals with and

without ASDs. However, when engaging in differential diagnosis, use of the Updated OSA

resulted in greater overall diagnostic accuracy and a better balance between sensitivity and

specificity.

Exploratory factor analysis also provided evidence of the structural validity of the

Module 1-RSA. A two-factor structure was retained, and all items loaded saliently on one of the

two factors, suggesting that updates to the scoring algorithm are not needed at this time.

Correlational analysis failed to provide consistent evidence for the convergent validity of ADOS-

G scores, but consistent evidence of the discriminant validity of ADOS-G scores was obtained.

The diagnostic accuracy of the Module 1-RSA also was investigated. In general, the

overall diagnostic accuracy of the RSA was determined to be Good and the RSA demonstrated

Page 127: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

114

higher levels of overall diagnostic accuracy compared to the standard and Module 1-OSA and

Module 1 Updated OSA for differentiating between participants with and without ASDs.

Similarly, the Module 1-RSA demonstrated high levels of sensitivity but lower than adequate

levels of specificity. Optimal cut-scores were identified for the Retained Module 1-RSA and

applied in an attempt to provide a better balance between sensitivity and specificity. The

accuracy of the resulting Retained Module 1-RSA was not significantly different from that of the

standard RSA at differentiating between those with and without ASDs. However, the Module 1-

RSA was not found to accurately predict the differential diagnosis (i.e., determining if a

participant on the autism spectrum has a Non-Autism ASD or Autistic Disorder) of participants

at a rate higher than would be expected by chance, suggesting that it should not be used for this

purpose.

Module 2. Exploratory factor analysis conducted on the Module 2-OSA provided support

for the structural validity of the module’s one-factor structure. All items loaded saliently on the

one-factor structure, and the scale demonstrated adequate reliability. As such, no updates to the

Module 2-OSA were recommended. Correlational analyses provided evidence for the

discriminant validity of the Module 2-OSA total scores, but consistent evidence of the

convergent validity of ADOS-G scores was not obtained.

The diagnostic accuracy of total scores from the OSA was also examined by

classification comparisons (i.e., ASD vs. No Spectrum Disorder, and Non-Autism ASD vs.

Autistic Disorder). For the ASD vs. No Spectrum Disorder comparisons, the Module 2-OSA

demonstrated Good overall diagnostic accuracy and adequate sensitivity, PPV, NPV, and hit

rates. However, the specificity of the standard OSA was lower than the recommended level for

diagnostic assessment. When differentiating between participants with Non-Autism ASDs and

Page 128: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

115

those with Autistic Disorder, the Module 2-OSA demonstrated Fair overall diagnostic accuracy.

Although the sensitivity of the standard OSA was high, inadequate levels of specificity, PPV,

NPV, and hit rates were measured to exist. In an attempt to find a better balance between

sensitivity and specificity, optimal cut-scores were identified for the Module 2-OSA and were

applied, creating the Retained Module 2-OSA. Across diagnostic comparisons, the Retained

OSA resulted in modest decreases to sensitivity and moderate gains to levels of specificity,

PPVs, NPVS, and hit rates than did the standard Module 2-OSA. In general, when compared to

the standard OSA, the Retained Module 2-OSA presented a better balance between sensitivity

and specificity.

Structural validity evidence for the Module 2-RSA was also obtained through factor

analysis. A one-factor solution was determined to best fit the data, although one item did not load

saliently on the one-factor solution. As a result, the Module 2 Updated RSA was created and

considered in subsequent analyses. Correlational analyses indicated that total scores on the

Module 2-RSA and Module 2 Updated RSA did not consistently demonstrate moderate to strong

relationships with other measures of autistic behavior, and results from the Module 2 Updated

RSA were less consistent with predictions than were results from the Module 2-RSA. Total

scores from both the standard and the Updated algorithms consistently demonstrated weak

relationships with measures of other behavioral functioning, providing evidence of the

discriminant validity of ADOS-G scores.

The overall diagnostic accuracy of the Module 2-RSA was determine to be Good for

making ASD vs. No Spectrum Disorder comparisons, and Fair for differential diagnosis on the

autism spectrum. Across diagnostic comparisons, the RSA demonstrated high levels of

sensitivity but inadequate levels of specificity. Use of the Module 2 Updated RSA resulted in

Page 129: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

116

slight improvements to overall levels of diagnostic accuracy across diagnostic comparisons. In

addition, use of the Updated RSA consistently resulted in modest to moderate increases in levels

of specificity, PPV, NPV, and hit rates as compared to the standard RSA.

Module 3. Results from the current study provided evidence in support of the structural

validity of the Module 3-OSA; however, exploratory factor analysis recommended the deletion

of one item that did not load saliently on the one-factor solution and that was decreasing overall

scale reliability. Based on this recommendation, the Module 3 Updated OSA was created and

considered in subsequent analyses. Consistent with predictions, correlational analyses indicated

that total scores on the Module 3 OSA consistently demonstrated weak relationships with

measures of other behavioral functioning, providing evidence of the discriminant validity of

ADOS-G scores. However, inconsistent with predictions, total scores on the OSA did not

consistently demonstrate moderate to strong relationships with other measures of autistic

behavior. Use of total scores from the Updated OSA did not result in greater consistency with

predictions.

The diagnostic accuracy of total scores from the OSA was also examined by

classification comparisons (i.e., ASD vs. No Spectrum Disorder, and Non-Autism ASD vs.

Autistic Disorder). For the ASD vs. No Spectrum Disorder comparisons, the Module 3-OSA

demonstrated Good overall diagnostic accuracy and adequate sensitivity, PPV, NPV, and hit

rates. However, the diagnostic accuracy of the Module 3 OSA was Fair for differential diagnosis

of participants on the autism spectrum, and specificity, PPV, and hit rates were inadequate based

on standards for diagnostic assessment (Matthey & Petrovski, 2002). Use of the Module 3

Updated OSA consistently resulted in modest decreases to overall diagnostic accuracy, but

increases to specificity over the standard Module 3 OSA and RSA.

Page 130: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

117

Structural validity evidence was also provided for the Module 3-RSA based on factor

analysis. EFA recommended the deletion of two items that did not load saliently on the one-

factor solution and that were decreasing overall scale reliability. As a result of these

recommendations, the Module 3 Updated RSA was created and considered in subsequent

analyses. Inconsistent with predictions, correlational analyses indicated that total scores on the

Module 3-RSA and Module 3 Updated RSA did not consistently demonstrate moderate to strong

relationships with other measures of autistic behavior. Instead of improving the consistency of

measured relationships, results from the Module 3 Updated RSA were less consistent with

predictions than were results from the standard RSA. Total scores from both the standard and the

Updated algorithms consistently demonstrated weak relationships with measures of other

behavioral functioning, providing evidence of the discriminant validity of ADOS-G scores.

The diagnostic accuracy of total scores from the RSA was also examined by

classification comparisons (i.e., ASD vs. No Spectrum Disorder, and Non-Autism ASD vs.

Autistic Disorder). For the ASD vs. No Spectrum Disorder comparisons, the Module 3-RSA

demonstrated Good overall diagnostic accuracy; however, the diagnostic accuracy of the Module

3-RSA was Fair for the differential diagnosis of participants on the autism spectrum. Across

classification comparisons, the overall diagnostic accuracy of the RSA was slightly better than

that of the OSA. Use of the Module 3 Updated RSA resulted in consistent to slightly decreased

overall diagnostic accuracy as compared to the standard RSA. However, use of the Updated RSA

consistently resulted in higher levels of specificity and a better balance between sensitivity and

specificity than did the standard RSA.

Page 131: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

118

Clinical Implications

In general, results of the current study confirm the structural validity and overall

diagnostic accuracy of the ADOS-G. However, the current research also highlights some of the

limitations of the ADOS-G. Low measured specificity values across modules and scoring

algorithms indicate that the ADOS-G is systematically over indentifying participants (i.e.,

indicating that students without ASDs are on the Autism Spectrum, and that students with Non-

Autism ASDs have Autistic Disorder). In addition, correlational analyses indicate that scores on

the ADOS-G do not demonstrate expected relationships with other quantitative measures of

autistic behavior. Further, some ADOS-G modules have stronger cumulative evidence to support

their use than others. Specifically, data from the current study suggests that, across scoring

algorithms and differential classifications, Modules 3 consistently demonstrates greater

diagnostic accuracy than does Module 1. This result may be due to limitations with the Module 1

sample size or discrepancies between the utility of the activities designed to elicit the behaviors

under investigation across modules. However, differences may also be a related to the

characteristics of the examinees for whom Module 1 was designed (i.e., young, nonverbal

children, whose limited verbal abilities may give the appearance of an ASD, even though they do

not have the disorder), as compared to the characteristics of examinees who are administered

Module 3 (i.e., older children and adolescents with fluent expressive language abilities). The

functioning of older individuals is more stable than is the functioning of young children, and, in

general, younger children, who often are not yet enrolled in formal schooling, have had far less

exposure to social situations outside of the home than older children and adolescents. This lack

of exposure to other children and adults may be responsible for measured social atypicalities

assessed on the ADOS-G.

Page 132: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

119

Despite some limitations, the psychometric strengths of the ADOS-G provide support for

its continued use in school-based psychoeducational evaluations for the diagnosis of students

with Autism Spectrum Disorders. In addition, the qualitative insights obtained through the

administration of the ADOS-G (which were not examined as a part of the current study) are

valuable to clinicians and, at times, are a critical factor when making diagnostic decisions in

daily practice.

Clinicians, however, need to recognize the limitations of the instrument and respond

accordingly. For example, consistent with authors’ (Lord et al., 1999) recommendations, the

ADOS-G should always be administered as part of a multimodal autism assessment and never

used as the solitary criteria for making a clinical diagnosis. Use of other measures and

assessment techniques (e.g., direct observations in a variety of different settings, and completing

structured interviews with parents and teachers) will allow clinicians to determine if performance

on the ADOS-G is consistent with, or discrepant from, a student’s typical functioning at home, at

school, and in the community. The timing of the ADOS-G administration in relation to the

completion of other assessment activities also should be considered, as it could influence a

clinician’s objectivity in their scoring of the ADOS-G. Further, age of the child and the module

administered also need to be considered when weighing the relative importance of a participant’s

performance on the ADOS-G in clinical diagnostic decision-making, especially if the

information obtained on the ADOS-G is discrepant with other data.

Based on the accumulated evidence, it is recommended that clinicians utilize the Updated

Original Scoring Algorithm when scoring an administration of Module 1. The Updated OSA

provides the best balance between sensitivity and specificity across diagnostic comparisons and,

unlike the standard RSA, can be used in the classification of ASDs and in the differential

Page 133: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

120

diagnosis between ASDs. Despite its strengths, the Updated OSA still produces lower than

adequate level of specificity. As such, clinicians should be mindful of this limitation and rely on

evidence from the ADOS in conjunction with other evaluation evidence when making a resulting

clinical diagnosis.

Based on the current study, it also is recommended that clinicians utilize the Updated

Revised Scoring Algorithm when scoring an administration of Module 2. Although the Module

2-OSA demonstrates slightly higher levels of overall diagnostic accuracy than the standard and

Updated RSA, particularly for making ASD vs. No Spectrum Disorder comparisons, use of the

Updated RSA provides the best balance between sensitivity and specificity across diagnostic

comparisons. Despite the better balance observed with the Updated RSA, specificity values are

still lower than recommended for diagnostic tests.

Further, evidence from the current study supports the use of the Module 3 Updated

Revised Scoring Algorithm when scoring an administration of the ADOS-G Module 3. The

Updated RSA provides better overall diagnostic accuracy than the standard and Updated OSA,

and a better balance between sensitivity and specificity than the standard RSA. Low specificity,

particular when used for differential diagnosis, is a consistent limitation of all ADOS-G scoring

algorithms in the current sample, the Module 3 Updated RSA included. As reported above,

clinicians need to recognize the limitations of the ADOS and use it as one component of a

multimodal assessment battery.

Limitations

Results must be considered in the context of several limitations. First, the current study

features a convenience sample drawn from one large southwestern school district. In addition,

fewer participants were administered Modules 1 and 2 as compared to Module 3. The small

Page 134: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

121

number of participants with data for Modules 1 and 2 became problematic when examining the

Revised Scoring Algorithm, which required further division of participants in each module into

two developmental cells. Once divided, the very small sample sizes for Module 1, No Words (N

= 16), and Module 2, Less Than 5 Years (N = 45), precluded analyses being conducted. In

general, small sample sizes for Modules 1 and 2 may be influencing the results of all of the

analyses conducted and leading to less robust/stable results. Of the analyses conducted, the small

samples were most problematic to the factor analyses of the Revised Scoring Algorithm. For

example, , RSA Module 1 (N = 66) and Module 2 (N = 73) sample sizes were slightly below

Mundfrom, Shaw, and Ke’s (2005) sample size recommendation (N > 90) for factor analysis

with the extraction of 2 factors. However, module sample sizes for the current study were

consistent with those used by Lord et al. (1999) during their examination of the structural

validity of the Original Scoring Algorithm.

Limitations also resulted from the item-scoring method of the ADOS-G. EFA is best

suited to interval data and with a 5-point (Dawis, 1987) or a 7-point (Gorsuch, 1997) response

scale. However, scores on the ADOS-G are ordinal in nature and, after the systematic recoding

of scores of 3 to 2, only 3 response options remained per item. The truncated response scale may

be restricting the range of inter-item correlations and result in the under representation of the

actual relationships between scale items. Despite these limitations, each of the six item

correlation matrices (i.e., Module 1 OSA, Module 1 RSA, Module 2 OSA, Module 2 RSA,

Module 3 OSA, and Module 3 RSA) were adequate for factorability based on criteria set a priori,

so analyses were conducted and interpreted.

As reported in the general discussion of the correlational analyses, a third limitation of

this study involves the inconsistencies between the ADOS-G and the other measures to which

Page 135: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

122

ADOS-G scores were compared, especially the GARS-2. Although both the ADOS-G and the

GARS-2 are purported to measure the behavioral characteristic of Autism Spectrum Disorders, it

is questionable if direct observations of behavior made by trained clinicians during a specified

period of time, and parent and teacher perceptions of a child’s “typical” functioning, as is

assessed by the GARS-2, should be considered equivalent. In order to obtain accurate evidence

of the convergent and discriminant validity of ADOS-G scores, ADOS-G scores need to be

compared to other direct assessments of Autistic behavior, also made by trained professionals.

As identified in other studies completed on the diagnostic accuracy of the ADOS (de

Bildt et al., 2009; Gotham et al., 2007; Gray et al., 2008; Overton et al., 2008), the determination

of diagnostic accuracy of the ADOS (as derived by comparing a participant’s ADOS-G

classification to their resulting clinical diagnosis) is confounded by the fact that the two

classifications are not independent: participants’ performance on the ADOS-G was one of the

assessment tools used to make the resulting clinical diagnosis. The obtained estimates of

diagnostic accuracy for the current study may be over inflated as a result of this confound.

However, based on the results of the diagnostic comparisons made with and without data from

the ADOS-G, it appears that this may not be the case.

Finally, there were limitations with the way data were collected for the independent

diagnostic comparisons. Specifically, clinicians were provided with a copy of each participant’s

comprehensive evaluation report (minus ADOS-G scores and diagnostic decisions) and asked to

use the remainder of the report data to determine if the student did or did not meet the criteria for

an Autism Spectrum Disorder. Although ADOS-G data (scores, test session observations) and

final diagnostic determinations were removed from the report, the way in which the evaluation

report was originally written may have been influenced by a participant’s performance on the

Page 136: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

123

ADOS, which, in turn, may have influenced the clinical diagnostic decision made “independent”

of the ADOS.

Future Research

Current findings provide directions and questions for future research. First, it is

recommended that the current study be replicated for the generalizability of results. However, an

overall larger sample size, with more consistency between module sample sizes for each of the

scoring algorithms, is also recommended. Future research should use a similar sample-

composition as the one used in the study (i.e., students referred for a school-based

psychoeducational diagnostic evaluation due to the suspicion of Autism/Autism Spectrum

Disorder). However, it would be wise to include participants from multiple school districts in

order to minimize the threat of any systematic differences reflected within participants or

assessment practices in a given school district.

Further analyses of the structural validity of the ADOS-G modules across scoring

algorithms are also recommended. Specifically, confirmatory factor analysis should be

conducted on the retained standard and updated factor structures to provide additional evidence

of the structural validity of the ADOS-G modules. In addition, more research should be

conducted on the Updated Scoring Algorithms identified in this study to determine if they

consistently result in greater levels of specificity and improved balance between sensitivity and

specificity across different samples.

Finally, although the results of the current study provide some evidence to suggest that

the diagnostic accuracy of decisions made with and without ADOS-G scores obtained from the

Original Scoring Algorithm are relatively consistent, the current study is the first to conduct this

comparison. As such, further research in this area is warranted. Use of a larger sample size, to

Page 137: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

124

allow for the comparison of the independent diagnostic accuracy of both the Original and

Revised Scoring Algorithms, is recommended. Further, to allow for truly independent

comparisons, it is recommended that two clinicians participate in all aspects of a comprehensive

autism evaluation (with the exception of the ADOS-G administration) and then the clinician who

did not participate in the ADOS-G administration make the final determination regarding clinical

diagnosis. Scores on the ADOS-G can then be compared to end clinical diagnosis made

independent of the ADOS-G.

Conclusions

Overall, the findings of this study add to the current body of evidence regarding the

validity and diagnostic accuracy of scores from the ADOS-G. Exploratory factor analysis

supported the structural validity of the Original and Revised Scoring Algorithms across the three

modules under investigation. However, item deletion was suggested for the majority of modules

and scoring algorithms to increase overall scale reliabilities. Item deletions from the Module 1-

OSA, Module 2-RSA, and Module 3 OSA and RSA resulted in the creation of four Updated

Scoring Algorithms, which were considered in subsequent analyses. Although evidence of

convergent validity was not obtained for ADOS-G scores in this current study, evidence of the

discriminant validity of total scores on the ADOS-G was obtained across modules and scoring

algorithms. A review of the indicators of diagnostic accuracy also indicated that scores from the

ADOS-G consistently demonstrate high levels of sensitivity across modules and scoring

algorithms, but inadequate levels of specificity. Although use of the Updated Scoring Algorithms

consistently improved the balance between sensitivity and specificity across modules,

“improved” levels of specificity were still lower than recommended for diagnostic tests. Low

levels of specificity may be related to the overlap in behavioral symptoms of ASDs and other

Page 138: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

125

disorders (i.e., ADHD and Generalized Anxiety Disorder) that students who do not receive a

diagnosis of Autism often receive. Given the low specificity values observed in this study, it is

imperative that the ADOS-G be used as one part of a multimodal evaluation and not be the

singular criteria against which a diagnosis of ASD is made. Based on the accumulated evidence,

use of the Module 1 Updated Original Scoring Algorithm, Module 2 Updated Revised Scoring

Algorithm, and Module 3 Updated Revised Scoring Algorithms is recommended to clinicians.

Page 139: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

126

References

Achenbach, T. M., & Rescorla, L. A. (2000). Manual for the ASEBA preschool forms and

profiles. Burlington, VT: University of Vermont, Research Center for Children, Youth,

and Families.

Allen, D. A. (1988). Autistic spectrum disorders: clinical presentation in preschool children.

Journal of Child Neurology, 3 (Suppl.), S48-56.

Allen, R. A., Robins, D. L., & Decker, S. L. (2008). Autism Spectrum Disorders: Neurobiology

and current assessment practices. Psychology in the Schools, 45, 905-917. doi:

10.1002/pits.20341

American Educational Research Association (1999). Standards for educational and

psychological testing. Washington, DC: Author.

American Psychiatric Association (1952). Diagnostic and statistical manual of mental disorders.

Washington, DC: Author.

American Psychiatric Association (1968). Diagnostic and statistical manual of mental disorders

(2nd ed.). Washington, DC: Author.

American Psychiatric Association (1980). Diagnostic and statistical manual of mental disorders

(3rd ed.). Washington, DC: Author.

American Psychiatric Association (1987). Diagnostic and statistical manual of mental disorders

(3rd ed. Revision). Washington, DC: Author.

American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders

(4th ed.). Washington, DC: Author.

American Psychiatric Association (2004). Diagnostic and statistical manual of mental disorders

(4th ed., Text Revision). Washington, DC: Author.

Autism (n.d.). Retrieved October 6, 2010 from http://www.apa.org/topics/autism/index.aspx.

Page 140: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

127

Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Psychology

(Statistical Section), 3, 77-85. Retrieved from http://onlinelibrary.wiley.com/

journal/10.1111/(ISSN)2044-8295

Bishop, D. V., & Norbury, C. F. (2002). Exploring the borderlands of autistic disorder and

specific language impairment: a study using standardized diagnostic instruments. Journal

of Child Psychology and Psychiatry & Allied Disciplines, 43, 917-929. doi:

10.1111/1469-7610.00114

Briggs, N. E., & MacCallum, R. C. (2003). Recovery of the weak common factors by maximum

likelihood and ordinary least squares estimation. Multivariate Behavioral Research, 38,

25-56. doi: 10.1207/S15327906MBR3801_2

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral

Research, 1, 245-276. doi: 10.1207/s15327906mbr0102_10

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and

standardized assessment instruments in psychology. Psychological Assessment, 4, 284-

290. doi: 10.1037/1040-3590.6.4.284

Cicchetti, D. V., Lord, C., Koenig, K., Klin, A., & Volkmar, F. R. (2008). Reliability of the ADI-

R: multiple examiners evaluate a single case. Journal of Autism and Developmental

Disorders, 38, 764-770. doi: 10.1007/s10803-007-0448-3

Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four

recommendations for getting the most from your analysis. Practical Assessment,

Research, & Evaluation, 10 (7). Retrieved from http://pareonline.net/pdf/v10n7.pdf.

Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychology, 34, 481-489. doi:

10.1037/0022-0167.34.4.481

Page 141: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

128

de Bildt, A., Sytema, S., Ketelaars, C., Kraijer, D., Mulder, E., Volkmar, F., & Minderaa, R.

(2004). Interrelationship between autism diagnostic observation schedule-generic

(ADOS-G), autism diagnostic interview-revised (ADI-R), and the diagnostic and

statistical manual of mental disorders (DSM-IV-TR) classification in children and

adolescents with mental retardation. Journal of Autism and Developmental Disorders, 34,

129-137. doi: 10.1007/s10803-009-0749-9

DiLavore, P., Lord, C., & Rutter, M. (1995). Pre-Linguistic Autism Diagnostic Observation

Schedule (PL-ADOS). Journal of Autism and Developmental Disorders, 25, 355-379.

doi: 10.1007/BF02179373

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use

of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-

299. doi: 10.1037/1082-989X.4.3.272

Filipek, P. A., Accardo, P. J., Baranek, G. T., Cook, E. H., Dawson, G., Gordon, B. et al. (1999).

The screening and diagnosis of Autism Spectrum Disorders. Journal of Autism and

Developmental Disorders, 29, 439-484.

Ghaziuddin, M. (2005). Mental health aspects of autism and Asperger syndrome. Philadelphia,

PA: Jessica Kingsley Publishers.

Gilliam, J. E. (1995). Gilliam Autism Rating Scale. Austin, TX: Pro-Ed.

Gilliam, J. E. (2006). Gilliam Autism Rating Scale (2nd ed.). Austin, TX: Pro-Ed.

Gioia, G. A., Isquith, P. K., Guy, S. C., & Kenworthy, L. (2000). Behavior Rating Inventory of

Executive Function Professional Manual. Lutz, FL: Psychological Assessment

Resources, Inc.

Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of

Page 142: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

129

Personality Assessment, 68, 532-560. doi: 10.1207/s15327752jpa6803_5

Gotham, K., Risi, S., Dawson, G., Tager-Flusberg, H., Joseph, R., Carter, A. et al. (2008).

A replication of the Autism Diagnostic Observation Schedule revised algorithms. Journal

of the American Academy of Child and Adolescent Psychiatry, 47, 642-651. doi:

10.1097/CHI.0b013e31816bffb7

Gotham, K., Risi, S., Pickles, A., & Lord, C. (2006). The Autism Diagnostic Observation

Schedule: Revised algorithms for improved diagnostic validity. Journal of Autism and

Developmental Disorders, 37, 613-627. doi: 10.1007/s10803-006-0280-1

Gray, K. M., Tonge, B. J., & Sweeney, D. J. (2008). Using the autism diagnostic interview-

revised and the autism diagnostic observation schedule with young children with

developmental delays: evaluating diagnostic validity. Journal of Autism and

Developmental Disorders, 38, 657-667. doi: 10.1007/s10803-007-0432-y

Horn, J. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika,

30, 179-185. doi: 10.1007/BF02289447

Kaiser, H. F. (1974). An index in factorial simplicity. Psychometrika, 39, 31-36. doi: 10.1007/

BF02291575

Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217-250.

Kline-Tasman, B. P., Risi, S., & Lord, C. E. (2007). Effect of language and task demands on the

diagnostic effectiveness of the Autism Diagnostic Observation Schedule: the impact of

module choice. Journal of Autism and Developmental Disorders, 37, 1224-1234. doi:

10.1007/s10803-006-0266-z

Krug, D. A., Arick, J. R., & Almond, P. J. (1993). The Autism Screening Instrument. Austin,

TX: Pro-Ed.

Page 143: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

130

Le Couteur, A., Haden, G., Hammal, D., & McConachie, H. (2008). Diagnosing autism spectrum

disorders on pre-school children using two standardized assessment instruments: the

ADI-R and the ADOS. Journal of Autism and Developmental Disorders, 38, 362-372.

doi: 10.1007/s10803-007-0403-3

Lord, C. E. (2010). Autism: from research to practice. American Psychologist, 65, 815-826. doi:

10.1111/j.1469-7610.1992.tb00887.x

Lord, C. & Risi, S. (1998). Frameworks and methods in diagnosing Autism Spectrum Disorders.

Mental Retardation and Developmental Disabilities Research Reviews, 4, 90-96. doi:

10.1002/(SICI)1098-2779(1998)4:2<90::AID-MRDD5>3.0.CO;2-0

Lord, C., Risi, S., DiLavore, P., Shulman, C., Thurm, A., & Pickles, A. (2006). Autism from two

to nine. Archives of General Psychiatry, 63, 694-701.

Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Leventhal, B. L., DiLavore, P. C., et al (2000).

The Autism Diagnostic Observation Schedule-Generic: A standard measure of social and

communication deficits associated with the spectrum of autism. Journal of Autism and

Developmental Disorders, 30, 205-223.

Lord, C., Rutter, M., DiLavore, P. C., & Risi, S. (1999). Autism Diagnostic Observation

Schedule. Los Angeles: Western Psychological Services.

Lord, C., Rutter, M., Goode, S., Heemsbergen, J., Jordan, H., Mawhood, L. et al. (1989).

Autism Diagnostic Observation Schedule: A standardized observation of communicative

and social behavior. Journal of Autism and Developmental Disorders, 19, 185-212. doi:

10.1007/BF02211841

Lord, C., Rutter, M., & LeCouteur, A. (1994). Autism Diagnostic Interview-Revised: A revised

Page 144: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

131

version of a diagnostic interview for caregivers of individuals with possible pervasive

developmental disorders. Journal of Autism and Developmental Disorders, 27, 659-685.

doi: 10.1007/BF02172145

Lord, C. & Volkmar, F. (2002). Genetics of childhood disorders: XLII. Autism, part 1: Diagnosis

and assessment in Autism Spectrum Disorders. Journal of the American Academy of

Child and Adolescent Psychiatry, 41, 1-5. doi: 10.1097/00004583-200209000-00015

Matson, J. L., & Gonzalez, M. L. (2007). Autism Spectrum Disorder-Diagnostic for Children.

Baton Rouge, LA: Disability Consultants, LLC.

Matson, J. L., Gonzalez, M., Wilkins, J., & Rivet, T. T. (2008). Reliability of the Autism

Spectrum Disorder-Diagnostic for Children in spectrum disorders in children: an

overview. Research in Autism Spectrum Disorders, 2, 533-545. doi:

10.1016/j.rasd.2007.11.001

Matson, J. L., Gonzalez, M., & Wilkins, J. (2009). Validity study of the Autism

Spectrum Disorder-Diagnostic for Children (ASD-DC). Research in Autism Spectrum

Disorders, 3, 196-206. doi: 10.1016/j.rasd.2008.05.005

Matthey, S., & Petrovsky, P. (2002). The Children’s Depression Inventory: Error in cutoff scores

for screening purposes. Psychological Assessment, 14, 146-149. doi:

10.1037//1040-3590.14.2.146

Mazefsky, C. A., & Oswald, D. P. (2006). The discriminative ability and diagnostic utility of the

ADOS-G, ADI-R, and GARS for children in a clinical setting. Autism, 10, 533-549. doi:

10.1177/1362361306068505

McClure, I., Mackay, T., Mamdani, H., & McCaughey, R. (2010). A comparison of a specialist

Page 145: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

132

autism spectrum disorder assessment team with local assessment teams. Autism, 14, 589-

603. doi: 10.1177/1362361310373369

Molloy, C. A., Murray, D. S., Akers, R., Mitchell, T., & Manning-Courtney, P. (2011). Use of

the Autism Diagnostic Observation Schedule (ADOS) in a clinical setting. Autism,

15, 143-162. doi: 10.1177/1362361310379241

Montgomery, J. M., Newton, B., & Smith, C. (2008). Test review: GARS-2: Gilliam Autism

Rating Scale Second Edition. Journal of Psychoeducational Assessment, 26, 395-401.

doi: 10.1177/0734282908317116

Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005) Minimum sample size recommendations for

conducting factor analysis. International Journal of Testing, 5, 159-168. doi:

10.1207/s15327574ijt0502_4

Noterdaeme, M., Mildenberger, K., Sitter, S., & Amorosa, H. (2002). Parent information and

direct observation in the diagnosis of pervasive and specific developmental disorders.

Autism, 6, 159-168. doi: 10.1177/1362361302006002003

Oosterling, I., Roos, S., de Bildt, A., Rommelse, N., de Jonge, M., Visser, J.,…Buitelaar, J.

(2010). Improved diagnostic validity of the ADOS revised algorithms: a replication study

in an independent sample. Journal of Autism and Developmental Disorders, 40, 689-703.

doi: 10.1007/s10803-009-0915-0

Overton, T., Fielding, C., & de Alba, R. G. (2007). Brief report: Exploratory analysis of

the ADOS revised algorithm: Specificity and predictive value with Hispanic children

referred for autism spectrum disorders. Journal of Autism and Developmental Disorders,

38, 1166-1169. doi: 10.1007/s10803-007-0488-8

Pandolfi, V., Magyar, C. I., & Dill, C. A. (2010). Constructs assessed by the GARS-2: Factor

Page 146: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

133

analysis data of the standardization sample. Journal of Autism and Developmental

Disorders, 40, 1118-1130. doi: 10.1007/s10803-010-0967-1

Papanikolaou, K., Paliokosta, E., Houliaras, G., Vgenopoulou, S., Giouroukou, E., Pehlivanidis,

A., …Tsiantis, I. (2009). Using the Autism Diagnostic Interview-Revised and the Autism

Diagnostic Observation Schedule-Generic for the diagnosis of autism spectrum disorders

in a Greek sample with a wide range of intellectual abilities. Journal of Autism and

Developmental Disorders, 39, 414-420. doi: 10.1007/s10803-008-0639-6

Reaven, J. A., Hepburn, S. L., Ross, R. G. (2008). Use of the ADOS and the ADI-R in children

with psychosis: importance of clinical judgment. Clinical Child Psychology and

Psychiatry, 13, 81-94. doi: 10.1177/1359104507086343

Reynolds, C. R. & Kamphaus, R. W. (2004). Behavior Assessment System for Children (2nd ed.).

Circle Pines, MN: AGS Publishing.

Risi, S., Lord, C., Gotham, K., Corsello, C., Chrysler, C., Szatmari, P., …Pickles, A. (2006).

Combining information from multiple sources in the diagnosis of Autism Spectrum

Disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 45,

1094-1103. doi: 10.1097/01.chi.0000227880.42780.0e

Rutter, M., LeCouteur, A., & Lord, C. (2003). Autism Diagnostic Interview-Revised. Los

Angeles, CA: Western Psychological Services.

Salvia, J., & Ysseldyke, J. E. (2004). Assessment in Special and Inclusive Education: Ninth

Edition. Boston: Houghton Mifflin.

Schopler, E., Reichler, R. J., & Rochen Renner, B. R. (1988). The Childhood Autism Rating

Scale. Los Angeles, CA: Western Psychological Services.

Sikora, D. M., Hall, T. A., Hartley, S. L., Gerrard-Morris, A. E., & Cagle, S. (2008). Does parent

Page 147: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

134

report of behavior differ across ADOS-G classifications: analysis of scores from the

CBCL and GARS. Journal of Autism and Developmental Disorders, 38, 440-448. doi:

10.1007/s10803-007-0407-z

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling. Boca Raton,

FL: Chapman & Hall/CRC Press.

South, M., Williams, B. J., McMahon, W. M., Owley, T., Filipek, P. A., Shernoff, E.,

…Ozonoff, S. (2002). Utility of the Gilliam Autism Rating Scale in research and clinical

populations. Journal of Autism and Related Disorders, 32, 593-599.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed). Boston: Allyn

and Bacon.

Tataryn, D. J., Wood, J. M., & Gorsuch, R. L. (1999). Setting the value of k in promax: A Monte

Carlo study. Educational and Psychological Measurement, 59, 384-391. doi:

10.1177/00131649921969938

Tomanik, S. S., Pearson, D. A., Loveland, K. A., Lane, D. M., & Shaw, J. B. (2007). Improving

the reliability of autism diagnoses: Examining the utility of adaptive behavior. Journal of

Autism and Developmental Disorders, 37, 921-928. doi: 10.1007/s10803-006-0227-6

Tsai, L. (1992). Diagnostic issues in high-functioning autism. In E. Schopler, & G. Mesibov

(Eds.), High functioning individuals with autism (pp. 11-40). New York: Plenum.

Velicer, W. F. (1976). Determining the number of components from the matrix of partial

correlations. Psychometrika, 41, 321-327. doi: 10.1007/BF02293557

Ventola, P. E., Kleinman, J., Pandey, J., Barton, M., Allen, S., Green, J.,…Fein, D. (2006).

Page 148: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

135

Agreement among four diagnostic instruments for autism spectrum disorders in toddlers.

Journal of Autism and Developmental Disorders, 36, 839-847. doi: 10.1007/s10803-006-

0128-8

Volkmar, F. R., Lord, C., Bailey, A., Schultz, R. T., & Klin, A. (2004). Autism and pervasive

developmental disorders. Journal of Child Psychology and Psychiatry, 45, 135-170. doi:

10.1046/j.0021-9630-2003.00317.x

Wegener, D. T., & Fabrigar, L. R. (2000). Analysis and design for nonexperimental data. In H.

T. Reis & C. M. Judd (Eds.) Handbook of research methods in social and personality

psychology (pp. 412-450). New York: Cambridge University Press.

Widaman, K. F. (1993). Common factor analysis versus principal component analysis:

Differential bias in representing model parameters? Multivariate Behavioral Research,

28, 263-311. doi: 10.1207/s15327906mbr2803_1

Wood, J. M., Tataryn, D. J., & Gorsuch, R. L. (1996). Effects of under- and overextraction on

principal factor analysis with Varimax rotation. Psychological Methods, 1, 354-365. doi:

10.1037/1082-989X.1.4.354

Page 149: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

136

Footnotes

1Due to an insufficient sample size (N = 16), the Module 1, No Words Revised Scoring

Algorithm was excluded from analyses of factor structure, correlational analyses, and analysis

of diagnostic accuracy. Subsequent references to the Module 1 Revised Scoring Algorithm

(Module 1-RSA) refer to the Module 1, Some Words Revised Scoring Algorithm only.

2Due to insufficient sample size (N = 45), the Module 2, Less Than 5 Years Revised

Scoring Algorithm was excluded from analyses of factor structure, correlational analyses, and

analysis of diagnostic accuracy. Subsequent references to the Module 2 Revised Scoring

Algorithm (Module 2-RSA) refer to the Module 2, Greater Than or Equal to 5 Years Revised

Scoring Algorithm only.

Page 150: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

137

Appendix A

Diagnostic Criteria for Autism Spectrum Disorders as defined by the Diagnostic and Statistical

Manual for Mental Disorders, Fourth Edition-Text Revision (DSM-IV-TR; American Psychiatric

Association, 2000)

Diagnostic Criteria for Autistic Disorder

A total of six or more items from 1, 2, and 3, with at least two from 1, and one each from 2

and 3:

1. Qualitative impairment in social interaction, as manifested by at least two of the following:

a. Marked impairment in the use of multiple nonverbal behaviors such as eye-to-eye gaze,

facial expression, body postures, and gestures to regulate social interaction

b. Failure to develop peer relationships appropriate to developmental level

c. A lack of spontaneous seeking to share enjoyment, interests, or achievements with other

people

d. A lack of social or emotional reciprocity

2. Qualitative impairments in communication as manifested by one of the following:

a. Delay in, or total lack of, the development of spoken language (not accompanied by

attempts to communicate nonverbally)

b. In individuals with adequate speech, marked impairment in the ability to initiate or

sustain a conversation with others

c. Stereotyped and repetitive use of language or idiosyncratic language

d. Lack of varied, make-believe play or social imaginative play appropriate to

developmental level

Page 151: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

138

3. Restricted repetitive and stereotyped patterns of behavior, interests, and activities, as

manifested by at least one of the following:

a. Encompassing preoccupation with one or more stereotyped and restricted patterns of

interest that is abnormal in intensity or focus

b. Apparently inflexible adherence to specific nonfunctional routines or rituals

c. Stereotyped and repetitive motor mannerisms

d. Persistent preoccupation with parts or objects

Delays or abnormal function in at least one of the following categories must be present prior to

age 3 years: social interaction, communicate language, and/or symbolic or imaginative play.

Diagnostic Criteria for Asperger’s Disorder

1. Qualitative impairment in social interaction, as manifested by at least two of the following:

a. Marked impairment in the use of multiple nonverbal behaviors such as eye-to-eye gaze,

facial expression, body postures, and gestures to regulate social interaction

b. Failure to develop peer relationships appropriate to developmental level

c. A lack of spontaneous seeking to share enjoyment, interests, or achievements with other

people

d. A lack of social or emotional reciprocity

2. Restricted repetitive and stereotyped patterns of behavior, interests, and activities, as

manifested by at least one of the following:

a. Encompassing preoccupation with one or more stereotyped and restricted patterns of

interest that is abnormal in intensity or focus

b. Apparently inflexible adherence to specific nonfunctional routines or rituals

c. Stereotyped and repetitive motor mannerisms

Page 152: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

139

d. Persistent preoccupation with parts or objects

3. The disturbance causes clinically significant impairment in social, occupational, or other

important areas of functioning.

4. There is no clinically significant general delay in language (e.g., single words used by age 2

years, communication phrases used by age 3 years).

5. There is no clinically significant delay in cognitive development or in the development of

age-appropriate self-help skills, adaptive behavior (other than social interaction), and

curiosity about the environment in childhood.

6. Criteria are not met for another specific Pervasive Developmental Disorder or Schizophrenia.

Diagnostic Criteria for Rhett’s Disorder

1. All of the following are observed:

a. Apparently normal prenatal and perinatal development

b. Apparently normal psychomotor development through the first 5 months after birth

c. Normal head circumference at birth

2. Onset of all of the following after the period of normal development:

a. Decelerations of head growth between ages 5 and 48 months

b. Loss of previously acquired purposeful hand skills between the ages 5 and 30 months

with the subsequent development of stereotyped hand movements

c. Loss of social engagement early in the course (although often social interaction develops

later)

d. Appearance of poorly coordinated gait or trunk movements

e. Severely impaired expressive and receptive language development with severe

psychomotor retardation

Page 153: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

140

Diagnostic Criteria for Childhood Disintegrative Disorder

1. Apparently normal development for at least the first 2 years after birth as manifested by the

presence of age-appropriate verbal and nonverbal communication, social relationships, play,

and adaptive behavior.

2. Clinically significant loss of previously acquired skills (before age 10 years) in at least two of

the following areas:

a. Expressive or receptive language

b. Social skills or adaptive behavior

c. Bowel or bladder control

d. Play

e. Motor skills

3. Abnormalities of functioning in at least two of the following areas:

a. Qualitative impairment in social interaction

b. Qualitative impairments in communication

c. Restricted, repetitive, and stereotyped patterns of behavior, interests, and activities,

including motor stereotypies and mannerisms

4. The disturbance is not better accounted for by another Pervasive Developmental Disorder or

by Schizophrenia.

Diagnostic Criteria for Pervasive Developmental Disorder Not Otherwise Specified

1. This category should be used when there is a severe and pervasive impairment in the

development of reciprocal social interaction associated with impairment in either verbal or

nonverbal communication skills or with the presence of stereotyped behavior, interests, and

activities, but the criteria are not met for a specific Pervasive Developmental Disorder,

Page 154: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

141

Schizophrenia, Schizotypal Personality Disorder, or Avoidant Personality Disorder. For

example, this category includes “atypical autism”-presentations that do not meet the criteria

for Autistic Disorder because of late age of onset, atypical symptomatology, or subthreshold

symptomatology, or all of these.

Page 155: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

142

Appendix B

Table B1

Activities on the Autism Diagnostic Observation Schedule and their Purpose by Module (Lord, Rutter, DiLavore, & Risi, 1999)

Module 1 Module 2 Module 3 Module 4

Free Play – used as a warm-up period in which the child can adjust to the testing environment and examiners, and to assess the child’s independent use of toys, engagement with parent/caregiver, and determine the presence or absence of repetitive behaviors

Construction Task – used as a warm-up activity, an opportunity to observe the child’s interactive behavior during a structured task, and allows for the observation of whether and how the child asks for help within the context of a structured task

Construction Task – used as a warm-up activity, an opportunity to observe the participant’s interactive behavior during a structured task, and allows for the observation of whether and how the participant asks for help within the context of a structured task

Construction Task – used as a warm-up activity, an opportunity to observe the participant’s interactive behavior during a structured task, and allows for the observation of whether and how the participant asks for help within the context of a structured task *optional activity for Module 4

Response to Name – used to assess the child’s response to his/her name when it is purposefully called to gain his/her attention

Response to Name – used to assess the child’s response to his/her name when it is purposefully called to gain his/her attention

Make-Believe Play – used to observe the participant’s creative or imaginative use of miniature objects in an unstructured task

Telling a Story From a Book – used to assess the participant’s ability to follow and comment on a sequential story in a picture book and to generate spoken language

(table continues)

Page 156: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

143

Table B1 (continued)

Module 1 Module 2 Module 3 Module 4

Response to Joint Attention – used to assess the child’s response to the examiner’s use of eye contact coordinated with facial orientation, verbalization, and pointing, in order to draw his/her attention from a distant object

Make-Believe Play – used to observe the child’s creative or imaginative use of miniature objects in an unstructured task

Joint Interactive Play – used to assess the degree and quality of the participant’s coordination of behavior and affect with the examiner in joint interactive play

Description of a Picture – used to generate a sample of language and/or other communicative behaviors *optional activity for Module 4

Bubble Play – used to elicit eye contact and vocalization from the child in coordination with his/her pointing or reaching in order to direct the attention of their parent/caregiver or the examiner to a distant object

Joint Interactive Play – used to assess the degree and quality of the child’s coordination of behavior and affect with the examiner in joint interactive play

Demonstration Task – used to assess the participant’s ability to communicate about a familiar series of actions using gesture or mime with accompanying language, and to report on a familiar event

Conversation and Reporting – used to assess the participant’s ability to engage in a conversation with to-and-fro interchange, to describe an event or situation for which there are no visual cues, to gain a language sample in less structured circumstances than the picture task, and to evaluate the participant’s ability to recount a nonroutine event

(table continues)

Page 157: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

144

Table B1 (continued)

Module 1 Module 2 Module 3 Module 4

Anticipation of a Routine with Objects – used to assess the child’s anticipation and initiation of the repetition of a an action routine with objects

Conversation – used to assess the child’s ability to carry out a minimal conversation with back-and-forth interchange, and to generate a language sample in less structured circumstances than the other tasks

Description of a Picture – used to generate a sample of language and/or other communicative behaviors

Current Work or School – used to evaluate how the participant describes his/her current situation, and whether he/she understands his/her role in determining what will happen in the future *optional activity for Module 4

Responsive Social Smile – used to assess the child’s smiling in response to a purely social overture from an adult

Response to Joint Attention – used to assess the child’s response to the examiner’s use of eye contact coordinated with facial orientation, verbalization, and pointing, in order to draw his/her attention to a distant object

Telling a Story From a Book – used to assess the participant’s ability to follow and comment on a sequential story in a picture book and to generate spoken language

Social Difficulties and Annoyance – used to assess the participant’s insight into personal social difficulties and sense of responsibility for his/her own actions

(table continues)

Page 158: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

145

Table B1 (continued)

Module 1 Module 2 Module 3 Module 4

Anticipation of a Social Routine – used to assess the child’s anticipation of, request for, and participation in a social routine

Demonstration Task – used to assess the child’s ability to communicate about a familiar series of actions using gesture or mime with accompanying language, and to report on a familiar event

Cartoons – used to observe the way in which the participant narrates a story, uses gestures to enact events, and integrates gesture with gaze and language

Emotions – used to probe the participant’s understanding of emotions, the contexts in which they arise, and his/her individual experience of emotions

Functional and Symbolic Imitation – used to observe the child’s imitation of simple actions with real objects and with nonmeaningful placeholders for the same objects

Description of a Picture – used to generate a sample of language and/or other communicative behaviors

Conversation and Reporting – used to assess the participant’s ability to engage in a conversation with to-and-fro interchange, to describe an event or situation for which there are no visual cues, to gain a language sample in less structured circumstances than the picture task, and to evaluate the participant’s ability to recount a nonroutine event

Demonstration Task – used to assess the participant’s ability to communicate about a familiar series of actions using gesture or mime with accompanying language, and to report on a familiar event

(table continues)

Page 159: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

146

Table B1 (continued)

Module 1 Module 2 Module 3 Module 4

Birthday Party – used to create an opportunity for the child to engage in functional and symbolic play

Telling a Story From a Book – used to assess the child’s ability to follow and comment on a sequential story in a picture book and to generate spoken language

Emotions – used to probe the participant’s understanding of emotions, the contexts in which they arise, and his/her individual experience of emotions

Cartoons – used to observe the way in which the participant narrates a story, uses gestures to enact events, and integrates gesture with gaze and language *optional activity for Module 4

Snack – used to give the child an opportunity to make requests in a familiar context

Free Play – used to create a relaxed situation with no demands or intrusions, in which the child can have a break from the demands of the evaluation, and to assess the child’s independent use of toys and his/her engagement with an adult during free play in a new environment

Social Difficulties and Annoyance – used to assess the participant’s insight into personal social difficulties and sense of responsibility for his/her own actions

Break – used to give the participant a break from the social demands of the assessment and to provide an opportunity to observe his/her behavior in less structured circumstances

(table continues)

Page 160: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

147

Table B1 (continued)

Module 1 Module 2 Module 3 Module 4

Birthday Party – used to create an opportunity for the child to engage in functional and symbolic play

Break – used to give the participant a break from the social demands of the assessment and to provide an opportunity to observe his/her behavior in less structured circumstances

Daily Living – used to obtain factual information and background for the socioemotional questions, and to evaluate the participant’s understanding and views regarding money, residential arrangements, and leisure activities

Snack – used to give the child an opportunity to make requests in a familiar context

Friends and Marriage – used to obtain a detailed description of one or more relationships that the participant would describe as friendship, and also to obtain a general description of his/her understanding of the concept of friendship and the idea of establishing a family or building a long-term relationship as a couple

Friends and Marriage – used to obtain a detailed description of one or more relationships that the participant would describe as friendship, and also to obtain a general description of his/her understanding of the concept of friendship and the idea of establishing a family or building a long-term relationship as a couple

(table continues)

Page 161: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

148

Table B1 (continued)

Module 1 Module 2 Module 3 Module 4

Anticipation of a Routine with Objects – used to assess the child’s anticipation and initiation of the repetition of a an action routine with objects

Loneliness – used to provide another opportunity to assess the participant’s insight into his/her social situation, and ability to describe his/her emotional reaction to it

Loneliness – used to provide another opportunity to assess the participant’s insight into his/her social situation, and ability to describe his/her emotional reaction to it

Bubble Play – used to elicit eye contact and vocalization from the child in coordination with his/her pointing or reaching in order to direct the attention of their parent/caregiver or the examiner to a distant object

Creating a Story – used to observe creativity in a play-like situation that is appropriate for older children, adolescents, and adults

Plans and Hopes – used to give the participant an opportunity to describe any goals or aspirations that he/she may have

Creating a Story – used to observe creativity in a play-like situation that is appropriate for older children, adolescents, and adults

Page 162: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

149

Table B2

Items Rated on the Autism Diagnostic Observation Schedule by Subdomain and Module (Lord, Rutter, DiLavore, & Risi, 1999)

Module

Subscale Module 1 Module 2 Module 3 Module 4

Language/Communication Overall level of non-echoed language

Overall level of non-echoed language

Overall level of non-echoed language

Overall level of non-echoed language

Frequency of vocalizations directed to othersa

Amount of social overtures/maintenance of attentiona

Speech abnormalities associated with autism (intonation/volume/rate)

Speech abnormalities associated with autism (intonation/volume/rate)

Intonation of vocalizations or verbalizations

Speech abnormalities associated with autism (intonation/volume/rate)

Stereotyped/idiosyncratic use of wordsa

Stereotyped/idiosyncratic use of wordsa

Immediate echolalia Immediate echolalia Immediate echolalia Immediate echolalia Stereotyped/idiosyncratic

use of wordsa Stereotyped/idiosyncratic use of wordsa

Offers information Offers information

Use of other’s body to communicatea

Conversationa Asks for information Asks for information

Pointinga Pointinga Reporting of eventsa Reporting of events

(table continues)

Page 163: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

150

Table B2 (continued)

Module

Subscale Module 1 Module 2 Module 3 Module 4

Language/Communication Gesturesa Descriptive, conventional, instrumental, or informational gesturesa

Conversationa Conversationa

Descriptive, conventional, instrumental, or informational gesturesa

Descriptive, conventional, instrumental, or informational gesturesa

Reciprocal Social Interaction

Shared enjoyment in interactiona

Showing Empathy/comments on others’ emotions

Communication of own affect

Responsive social smile Facial expressions directed to othersa

Facial expressions directed to othersa

Facial expressions directed to othersa

Facial expressions

directed to othersa Shared enjoyment in interaction

Language linked to nonverbal communication

Language linked to nonverbal communication

Requesting Response to joint attention Quality of social overturea Insight

Integration of gaze and other behaviors during social overtures

Response to name Shared enjoyment in interaction

Shared enjoyment in interaction

Unusual eye contacta Unusual eye contacta Unusual eye contacta Unusual eye contacta

(table continues)

Page 164: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

151

Table B2 (continued)

Module

Subscale Module 1 Module 2 Module 3 Module 4

Reciprocal Social Interaction

Response to name Spontaneous initiation of joint attentiona

Insighta Empathy/comments on others’ emotionsa

Giving Quality of social overturea Quality of social responsea Responsibilitya

Showinga Quality of social responsea Amount of reciprocal

social communication

Quality of social overturea

Spontaneous initiation of joint attentiona

Amount of reciprocal social communicationa

Overall quality of rapporta Quality of social response

Response to joint attentiona

Overall quality of rapporta Amount of reciprocal social communicationa

Quality of social

overturesa

Overall quality of rapport

Play/Imagination Functional play with objects

Functional play with objects

Imagination/creativity Imagination/creativity

Imagination/creativity Imagination/creativity

(table continues)

Page 165: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

152

Table B2 (continued)

Module

Subscale Module 1 Module 2 Module 3 Module 4

Stereotyped Behaviors and Restricted Interests

Unusual sensory interest in play material/person

Unusual sensory interest in play material/person

Unusual sensory interest in play material/person

Unusual sensory interest in play material/person

Hand/finger and other complex mannerisms

Hand/finger and other complex mannerisms

Hand/finger and other complex mannerisms

Hand/finger and other complex mannerisms

Self-injurious behavior Self-injurious behavior Self-injurious behavior Self-injurious behavior

Unusually repetitive interests or stereotyped behaviors

Unusually repetitive interests or stereotyped behaviors

Excessive interest in/reference to unusual/highly specified topics, objects or repetitive behaviors

Excessive interest in/reference to unusual/highly specified topics, objects or repetitive behaviors

Compulsions or rituals Compulsions or rituals

Other Abnormal Behavior Overactivity Overactivity Overactivity Overactivity

Tantrums, aggression, or disruptive behavior

Tantrums, aggression, or disruptive behavior

Tantrums, aggression, or disruptive behavior

Tantrums, aggression, or disruptive behavior

Anxiety Anxiety Anxiety Anxiety

Note. a = Item included in the original ADOS-G scoring algorithm.

Page 166: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

153

Table B3

Items Included in the Revised Scoring Algorithm on the Autism Diagnostic Observation Schedule-Generic by Developmental Cell

Module

Factor

Module 1 No Words Module 1 Some Words

Module 2 Younger than 5

Module 2 Greater than or Equal To 5

Module 3

Social Affect Unusual eye contact Unusual eye contact Unusual eye contact Unusual eye contact Unusual eye contact

Integration of gaze and other behaviors during social overtures

Integration of gaze and other behaviors during social overtures

Amount of reciprocal social communication

Amount of reciprocal social communication

Amount of reciprocal social communication

Facial expressions directed to others

Facial expressions directed to others

Facial expressions directed to others

Facial expressions directed to others

Facial expressions directed to others

Frequency of vocalizations directed to others

Frequency of vocalizations directed to others

Overall quality of rapport

Overall quality of rapport

Overall quality of rapport

Shared enjoyment in interaction

Shared enjoyment in interaction

Shared enjoyment in interaction

Shared enjoyment in interaction

Shared enjoyment in interaction

Quality of social overtures

Quality of social overtures

Quality of social overtures

Quality of social overtures

Quality of social overtures

(table continues)

Page 167: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

154

Table B3 (continued)

Module

Factor Module 1 No Words Module 1 Some Words

Module 2 Younger than 5

Module 2 Greater than or Equal To 5

Module 3

Social Affect Gestures Gestures Descriptive, conventional, or informational gestures

Descriptive, conventional, or informational gestures

Descriptive, conventional, or informational gestures

Showing Showing Showing Showing Quality of social response

Initiation of joint attention

Spontaneous initiation of joint attention

Spontaneous initiation of joint attention

Spontaneous initiation of joint attention

Reporting of events

Response to joint attention

Pointing Pointing Pointing Pointing

Restricted Repetitive Behaviors

Unusual sensory interest in play material/person

Unusual sensory interest in play material/person

Unusual sensory interest in play material/person

Unusual sensory interest in play material/person

Unusual sensory interest in play material/person

Intonation of vocalizations or verbalizations

Stereotyped use of words or phrases

Stereotyped use of words or phrases

Stereotyped use of words or phrases

Stereotyped use of words or phrases

(table continues)

Page 168: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

155

Table B3 (continued)

Module

Factor

Module 1 No Words Module 1 Some Words

Module 2 Younger than 5

Module 2 Greater than or Equal To 5

Module 3

Restricted Repetitive Behaviors

Unusually repetitive interests or stereotyped behaviors

Unusually repetitive interests or stereotyped behaviors

Unusually repetitive interests or stereotyped behaviors

Unusually repetitive interests or stereotyped behaviors

Excessive interest in/reference to unusual/highly specified topics, objects or repetitive behaviors

Note. Revised algorithms obtained from Gotham, Risi, Pickles, & Lord (2007).

Page 169: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

156

Appendix C

Table C1 Correlation Matrix of Items Included in the ADOS-G Module 1, Original Scoring Algorithm (N = 82) Item A2 A5 A6 A7 A8 B1 B3 B5 B9 B10 B11 B12

A-2 -

A-5 .06 -

A-6 .36 -.01 -

A-7 .64 .08 .26 -

A-8 .52 .16 .25 .65 -

B-1 .69 .15 .26 .52 .38 -

B-3 .72 .12 .22 .57 .46 .67 -

B-5 .61 .05 .25 .54 .63 .51 .62 -

B-9 .74 .01 .28 .54 .45 .69 .63 .49 -

B-10 .66 -.01 .24 .63 .49 .43 .71 .58 .54 -

B-11 .54 -.05 .28 .47 .52 .44 .47 .52 .42 .49 -

B-12 .81 .09 .30 .58 .44 .63 .71 .52 .78 .63 .46 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 170: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

157

Table C2

Correlation Matrix of Items Included in the ADOS-G Module 1, Revised Scoring Algorithm (N = 66) Item A2 A5 A7 A8 B1 B3 B4 B5 B9 B10 B12 D1 D2 D4

A-2 -

A-5 .18 -

A-7 .59 .22 -

A-8 .52 .26 .67 -

B-1 .67 .27 .47 .37 -

B-3 .71 .24 .54 .45 .66 -

B-4 .62 .14 .57 .41 .68 .71 -

B-5 .56 .19 .48 .65 .49 .59 .47 -

B-9 .72 .12 .49 .44 .66 .62 .61 .45 -

B-10 .62 .09 .60 .49 .41 .69 .55 .51 .53 -

B-12 .80 .21 .54 .44 .60 .71 .56 .50 .76 .63 -

D-1 .35 .29 .32 .42 .41 .40 .25 .38 .36 .34 .32 -

D-2 .19 .16 .06 .27 .31 .26 .10 .23 .23 .17 .17 .48 -

D-4 .34 .37 .37 .35 .43 .43 .24 .36 .32 .31 .39 .48 .39 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 171: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

158

Table C3 Correlation Matrix of Items Included in the ADOS-G Module 2, Original Scoring Algorithm (N = 118) Item A2 A5 A6 A7 A8 B1 B2 B6 B8 B9 B10 B11

A-2 -

A-5 .52 -

A-6 .72 .61 -

A-7 .62 .57 .54 -

A-8 .69 .64 .65 .61 -

B-1 .56 .56 .60 .42 .59 -

B-2 .68 .45 .60 .56 .66 .62 -

B-6 .61 .37 .50 .59 .43 .30 .56 -

B-8 .81 .56 .73 .58 .67 .66 .70 .66 -

B-9 .71 .58 .69 .59 .69 .65 .72 .52 .76 -

B-10 .74 .60 .86 .55 .70 .63 .65 .57 .81 .78 -

B-11 .69 .58 .65 .62 .73 .56 .70 .54 .71 .79 .74 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 172: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

159

Table C4

Correlation Matrix of Items Included in the ADOS-G Module 2, Revised Scoring Algorithm (N = 73) Item A5 A7 A8 B1 B2 B3 B5 B6 B8 B10 B11 D1 D2 D4

A-5 -

A-7 .59 -

A-8 .60 .52 -

B-1 .56 .42 .58 -

B-2 .43 .55 .59 .59 -

B-3 .42 .55 .65 .40 .69 -

B-5 .54 .59 .60 .49 .57 .63 -

B-6 .35 .62 .39 .30 .58 .52 .72 -

B-8 .53 .56 .63 .64 .66 .64 .73 .66 -

B-10 .60 .56 .65 .62 .60 .60 .71 .62 .85 -

B-11 .53 .60 .73 .52 .67 .77 .66 .53 .69 .75 -

D-1 .25 .20 .38 .23 .43 .50 .30 .27 .30 .28 .35 -

D-2 .19 .09 .16 .21 .19 .06 -.03 .01 .18 .24 .05 .32 -

D-4 .56 .51 .62 .36 .54 .49 .46 .39 .43 .43 .52 .50 .26 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 173: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

160

Table C5 Correlation Matrix of Items Included in the ADOS-G Module 3, Original Scoring Algorithm (N = 261) Item A4 A7 A8 A9 B1 B2 B6 B7 B8 B9 B10

A-4 -

A-7 .16 -

A-8 .22 .53 -

A-9 .09 .52 .49 -

B-1 .18 .26 .42 .38 -

B-2 .29 .43 .58 .52 .53 -

B-6 .22 .48 .52 .36 .33 .44 -

B-7 .27 .48 .62 .52 .49 .62 .55 -

B-8 .36 .48 .62 .50 .45 .61 .59 .72 -

B-9 .21 .49 .74 .53 .44 .58 .48 .68 .64 -

B-10 .17 .47 .60 .53 .41 .58 .52 .64 .61 .62 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 174: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

161

Table C6

Correlation Matrix of Items Included in the ADOS-G Module 3, Revised Scoring Algorithm (N = 262) Item A4 A7 A8 A9 B1 B2 B4 B7 B8 B9 B10 D1 D2 D4

A-4 -

A-7 .16 -

A-8 .22 .53 -

A-9 .09 .52 .49 -

B-1 .18 .27 .42 .38 -

B-2 .29 .43 .58 .52 .53 -

B-4 .26 .50 .53 .56 .36 .65 -

B-7 .27 .48 .62 .52 .49 .62 .57 -

B-8 .36 .48 .62 .50 .45 .61 .57 .72 -

B-9 .21 .49 .74 .53 .44 .58 .59 .68 .64 -

B-10 .17 .47 .60 .53 .41 .58 .61 .64 .61 .62 -

D-1 .10 .17 .14 .10 .11 .21 .27 .21 .18 .18 .16 -

D-2 .05 .06 .11 .15 .12 .03 .09 .15 .15 .14 .07 .07 -

D-4 .44 .24 .30 .22 .17 .31 .32 .32 .37 .28 .23 .07 -.02 - Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 175: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

162

Appendix D

Table D1 Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOS-G Original Scoring Algorithm One-Factor Solutions

Module 1 (N = 82) Module 2 (N = 118) Module 3 (N = 262)

Item

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

A-2 .835 .879 .824 .946 N/A N/A

A-4 N/A N/A N/A N/A .283 .912

A-5 .081 .917 .673 .951 N/A N/A

A-6 .335 .905 .806 .947 N/A N/A

A-7 .719 .884 .692 .950 .588 .897

A-8 .651 .888 .791 .947 .750 .888

A-9 N/A N/A N/A N/A .613 .896

B-1 .695 .885 .684 .951 .525 .903

B-2 N/A N/A .773 .948 .726 .890

(table continues)

Page 176: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

163

Table D1 (continued)

Module 1 (N = 82) Module 2 (N = 118) Module 3 (N = 262)

Item

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

B-3 .774 .882 N/A N/A N/A N/A

B-5 .698 .885 N/A N/A N/A N/A

B-6 N/A N/A .619 .952 .617 .895

B-7 N/A N/A N/A N/A .788 .887

B-8 N/A N/A .865 .945 .780 .887

B-9 .719 .886 .842 .946 .757 .887

B-10 .701 .886 .862 .945 .716 .889

B-11 .589 .891 .822 .946 N/A N/A

B-12 .775 .884 N/A N/A N/A N/A

Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 177: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

164

Table D2 Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOS-G Revised Scoring Algorithm One-Factor Solutions

Module 1 (N = 66) Module 2 (N = 73) Module 3 (N = 262)

Item

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

A-2 .775 .897 N/A N/A N/A N/A

A-4 N/A N/A N/A N/A .324 .893

A-5 .299 .916 .661 .925 N/A N/A

A-7 .662 .901 ,686 .925 .578 .883

A-8 .647 .902 .767 .922 .732 .876

A-9 N/A N/A N/A N/A .622 .881

B-1 .720 .899 .634 .927 .513 .888

B-2 N/A N/A .764 .922 .734 .876

B-3 .794 .896 .742 .923 N/A N/A

(table continues)

Page 178: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

165

Table D2 (continued)

Module 1 (N = 66) Module 2 (N = 73) Module 3 (N = 262)

Item

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

B-4 .663 .902 N/A N/A .718 .876

B-5 .660 .901 .758 .923 N/A N/A

B-6 N/A N/A .636 .926 N/A N/A

B-7 N/A N/A N/A N/A .778 .875

B-8 N/A N/A .819 .920 .772 .875

B-9 .704 .900 N/A N/A .753 .874

B-10 .661 .901 .820 .920 .701 .877

B-11 N/A N/A .801 .921 N/A N/A

B-12 .743 .899 N/A N/A N/A N/A

D-1 .538 .906 .441 .931 .225 .895

D-2 .330 .914 .192 .937 .128 .898

D-4 .536 .906 .640 .926 .385 .892

Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic.

Page 179: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

166

Table D3 Corrected Item-Total Correlations and Cronbach’s Alpha if Item Deleted Values for ADOS-G Revised Scoring Algorithm Two-Factor Solutions

Module 1 (N = 66) Module 2 (N = 73)

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Item Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2

A-2 .826 .914 N/A N/A N/A N/A

A-5 .339 .710 .646 .937

A-7 .697 .921 .696 .935

A-8 .623 .925 .750 .518 .933 .625

B-1 .698 .922 .639 .938

B-2 N/A N/A N/A N/A .747 .933

B-3 .804 .915 .736 .933

B-4 .732 .920 N/A N/A N/A N/A

B-5 .662 .923 .789 .931

(table continues)

Page 180: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

167

Table D3 (continued) Module 1 (N = 66) Module 2 (N = 73)

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Corrected Item-Total Correlations

Cronbach’s α if Item Deleted

Item Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2

B-6 N/A N/A N/A N/A .656 .936

B-8 N/A N/A N/A N/A .841 .929

B-9 .741 .919 N/A N/A N/A N/A

B-10 .703 .920 .836 .929

B-11 N/A N/A N/A N/A .818 .930

B-12 .778 .918 N/A N/A N/A N/A

D-1 .563 .563 .526 .622

D-2 .438 .640 .290 .744

D-4 .561 .562 .656 .525

Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic

Page 181: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

168

Appendix E Table E1 Structure Coefficients and Communalities for the ADOS-G Module 1(Original Scoring Algorithm) Items with Deletion of Item A-5 (N = 82) Item Structure Coefficient Communality

A-2: Frequency of vocalizations to others .892 .796

A-6: Use of other’s body to communicate .355 .126

A-7: Pointing .749 .560

A-8: Gestures .652 .425

B-1: Unusual eye contact .730 .533

B-3: Facial expressions directed to others .822 .676

B-5: Shared enjoyment in interaction .727 .529

B-9: Showing .781 .610

B-10: Spontaneous initiation of joint attention .756 .572

B-11: Response to joint attention .623 .388

B-12: Quality of social overtures .830 .689

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.

Page 182: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

169

Table E2 Structure Coefficients and Communalities for the ADOS-G Module 2 (Revised Scoring Algorithm) Items with Deletion of Item D-2 (N = 73) Item Structure Coefficient Communality

A-5: Stereotyped use of words .672 .451

A-7: Pointing .716 .512

A-8: Gestures .792 .627

B-1: Unusual eye contact .650 .422

B-2: Facial expressions directed to others .783 .614

B-3: Shared enjoyment in interactions .782 .612

B-5: Showing .807 .652

B-6: Spontaneous initiation of joint attention .682 .465

B-8: Quality of social overtures .850 .723

B-10: Amount of reciprocal social communication .847 .717

B-11: Overall quality of rapport .848 .719

D-1: Unusual sensory interests in person/play materials .436 .190

D-4:Repetitive interests/stereotyped behaviors .640 .410

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold. Due to an insufficient sample size, the Module 2, Less Than 5 Years Revised Scoring Algorithm was excluded from analyses of factor structure. References to the Module 2 Revised Scoring Algorithm refer to the Module 2, Greater Than or Equal to 5 Years Revised Scoring Algorithm only.

Page 183: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

170

Table E3

Structure Coefficients and Communalities for the ADOS-G Module 3 (Original Scoring Algorithm) Items with Deletion of Item A-4 (N = 262) Item Structure Coefficient Communality

A-7: Reporting of events .623 .389

A-8: Conversation .796 .634

A-9: Gestures .654 .427

B-1: Unusual eye contact .555 .308

B-2: Facial expressions directed to others .748 .560

B-6: Shared enjoyment in interactions .649 .422

B-7: Quality of social overtures .831 .691

B-8: Quality of social response .811 .658

B-9: Amount of reciprocal social communication .808 .653

B-10: Overall quality of rapport .768 .589

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.

Page 184: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

171

Table E4 Structure Coefficients and Communalities for the ADOS-G Module 3 (Revised Scoring Algorithm) Items with Deletion of Items D-1 and D-2 (N = 261) Item Structure Coefficient Communality

A-4: Stereotyped use of words/phrases .327 .107

A-7: Reporting of events .617 .380

A-8: Conversation .781 .610

A-9: Gestures .663 .440

B-1: Unusual eye contact .549 .301

B-2: Facial expressions directed to others .775 .600

B-4: Shared enjoyment in interaction .751 .564

B-7: Quality of social overtures .822 .675

B-8: Quality of social response .808 .653

B-9: Amount of reciprocal social communication .806 .650

B-10: Overall quality of rapport .759 .577

D-4: Excessive interest in specific topics/repetitive behav .395 .156

Note. Table presents the extraction of a one-factor solution using Principal Axis Extraction. Salient structure coefficients (> .32) are identified in bold.

Page 185: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

172

Appendix F

Table F1

Cut Scores Used for ADOS-G Classification Determinations by Module and Scoring Algorithm Cut-Scores

Original/Revised Scoring Algorithm Module

Original Scoring Algorithm (C + SI Total Score)

Revised Scoring Algorithm (SA + RRB Total Score)

Module 1

Non-Autism ASDa 7 8

Autistic Disorder 12 12

Module 2

Non-Autism ASDa 8 8

Autistic Disorder 12 9

Module 3

Non-Autism ASDa 7 7

Autistic Disorder 10 9

Note. ADOS-G = Autism Diagnostic Observation Schedule-Generic; C + SI = Communications + Social Interaction Total Score; SA + RRB = Social Affect + Restricted Repetitive Behavior Total Score. aNon-Autism ASD = PDD NOS and Asperger’s Disorder.

Page 186: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

173

Table F2 Sensitivity and Specificity Values of Scores on the Original Scoring Algorithm from the Current Sample and Lord et al.’s (1999) Original Sample Sensitivity Specificity

Current

Sample Original Sample

Current Sample

Original Sample

Non-Autism ASD vs. No Spectrum Disorder (N = 294)

Module 1 1.00 .94 .75 .94

Module 2 .83 .89 .62 .88

Module 3 .95 .80 .44 .94

Autistic Disorder vs. No Spectrum Disorder (N = 233)

Module 1 .90 1.00 .75 1.00

Module 2 .90 .95 .79 .94

Module 3 .94 .90 .66 1.00

Autistic Disorder + ASD vs. No Spectrum Disorder (N = 400)

Module 1 1.00 .97 .52 .94

Module 2 .90 .95 .67 .87

Module 3 .96 .90 .44 .94

Page 187: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

174

Table F3 Sensitivity and Specificity Values of Scores on the Revised Scoring Algorithm from the Current Sample and Previous Studies Sensitivity

Specificity

Current Sample

Gotham et al.

(2007)

Gotham et al.

(2008)

Molloy et al.

(2011)

Current Sample

Gotham et al.

(2007)

Gotham et al.

(2008)

Molloy et al.

(2011)

Non-Autism ASD vs. NSD (N = 294)

Module 1a 1.00 .77 .95 1.0 .75 .82 .75 .46

Module 2b .83 .83 NA .85 .70 .83 NA .60

Module 3 .90 .72 .60 .87 .49 .76 .88 .35

Autistic Disorder vs. NSD (N = 234)

Module 1a .97 .97 .89 .93 .75 .91 .91 .69

Module 2b .95 .98 NA .94 .76 .90 NA .65

Module 3 .94 .91 .82 .92 .65 .84 .92 .55

Note. NSD = No Spectrum Disorder. Studies presented for direct comparison are those with sample sizes consistent with the current sample (N > 300) and conducted with American children and adolescents. aDescribes scores from the Module 1, Some Words Revised Scoring Algorithm only. bDescribes scores from the Module 2, > 5 Years Revised Scoring Algorithm only

Page 188: VALIDITY AND DIAGNOSTIC ACCURACY OF SCORES FROM THE …

175

Appendix G

Curriculum Vitae

Melissa A. Reid, M.Ed. [email protected]; [email protected]

203-530-9567

EDUCATION The Pennsylvania State University, University Park, PA Doctor of Philosophy, School Psychology August 2012 (Anticipated) GPA: 3.91/4.0 Masters of Education, School Psychology December 2006 Southern Connecticut State University, New Haven, CT Bachelor of Science, Psychology May 2003 GPA: 3.97/4.0 PROFESSIONAL LICENSURE/CERTIFICATION Licensed Specialist in School Psychology – Texas State Board of Examiners of Psychologist August 2009 – present Certified School Psychologist – Pennsylvania Department of Education January 2008 - present EMPLOYMENT EXPERIENCES Lewisville Independent School District, Lewisville, TX Licensed Specialist in School Psychology 8/2009 – present Pre-doctoral Psychology Intern (APA Accredited Internship) 8/2008 – 8/2009 The Pennsylvania State University, University Park, PA Graduate Assistant, Penn State Outreach Market Research 7/2007 – 7/2008 Teaching Assistant, SPSY 559 (Cognitive Assessment) 1/2007 – 5/2007

Graduate Assistant, CEDAR Clinic Staff 8/2005 – 5/2007 Research Assistant, Dr. Richard Carlson 5/2005 – 8/2005

Teaching Assistant, IST 210 (Database Management Systems) 1/2005 – 5/2005 The Second Mile, State College, PA Freelance Program Evaluator 2/2006 – 8/2007 Yale University School of Medicine, New Haven, CT

Research Assistant, Substance Abuse Research Center 9/2003 – 8/2004

PROFESSIONAL MEMBERSHIPS National Association of School Psychologists 5/2005 – present American Psychological Association, Student Affiliate 8/2008 – present Texas Association of School Psychologists, Student Member 9/2008 – present Dallas/Fort Worth Regional Association of School Psychologists 9/2008 – present