an evaluation of the psychometric properties of the coaching efficacy scale for coaches from the...

28
This article was downloaded by: ["Queen's University Libraries, Kingston"] On: 02 October 2013, At: 22:48 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Measurement in Physical Education and Exercise Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hmpe20 An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America Nicholas D. Myers , Edward W. Wolfe & Deborah L. Feltz Published online: 18 Nov 2009. To cite this article: Nicholas D. Myers , Edward W. Wolfe & Deborah L. Feltz (2005) An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America, Measurement in Physical Education and Exercise Science, 9:3, 135-160, DOI: 10.1207/s15327841mpee0903_1 To link to this article: http://dx.doi.org/10.1207/s15327841mpee0903_1 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or

Upload: deborah-l

Post on 19-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

This article was downloaded by: ["Queen's University Libraries, Kingston"]On: 02 October 2013, At: 22:48Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Measurement in PhysicalEducation and Exercise SciencePublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/hmpe20

An Evaluation of thePsychometric Properties ofthe Coaching Efficacy Scalefor Coaches From the UnitedStates of AmericaNicholas D. Myers , Edward W. Wolfe & Deborah L.FeltzPublished online: 18 Nov 2009.

To cite this article: Nicholas D. Myers , Edward W. Wolfe & Deborah L. Feltz (2005) AnEvaluation of the Psychometric Properties of the Coaching Efficacy Scale for CoachesFrom the United States of America, Measurement in Physical Education and ExerciseScience, 9:3, 135-160, DOI: 10.1207/s15327841mpee0903_1

To link to this article: http://dx.doi.org/10.1207/s15327841mpee0903_1

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly or

Page 2: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

indirectly in connection with, in relation to or arising out of the use of theContent.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 3: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

An Evaluation of the PsychometricProperties of the Coaching EfficacyScale for Coaches From the United

States of America

Nicholas D. MyersDepartment of Educational and Psychological Studies

University of Miami

Edward W. WolfeDepartment of Educational Leadership and Policy Studies

Virginia Polytechnic Institute and State University

Deborah L. FeltzDepartment of KinesiologyMichigan State University

This study extends validity evidence for the Coaching Efficacy Scale (CES; Feltz,Chase, Moritz, & Sullivan, 1999) by providing an evaluation of the psychometricproperties of the instrument from previously collected data on high school and col-lege coaches from United States. Data were fitted to a multidimensional item re-sponse theory model. Results offered some supporting evidence concerning valid-ity based on the fit of a multidimensional conceptualization of coaching efficacy(i.e., motivation, game strategy, technique, and character building) as compared toa unidimensional conceptualization of coaching efficacy (i.e., total coaching effi-cacy), the fit of the majority of items to the measurement model, the internal con-sistency of coaching efficacy estimates, and the precision of total coaching effi-cacy estimates. However, concerns exist relating to the rating scale structure, the

MEASUREMENT IN PHYSICAL EDUCATION AND EXERCISE SCIENCE, 9(3), 135–160Copyright © 2005, Lawrence Erlbaum Associates, Inc.

Requests for reprints should be sent to Nicholas D. Myers, Department of Educational and Psycho-logical Studies, University of Miami, Coral Gables, FL 33124–2040. E-mail: [email protected]

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 4: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

precision of multidimensional coaching efficacy estimates, and misfit of a coupleof items to the measurement model. Practical recommendations for both future re-search with the CES and for the development of a revised instrument are for-warded.

Key words: coaching efficacy, self-efficacy, Rasch model, multidimensional itemresponse theory

Over the past few decades, much of the research in sport leadership has been di-rected toward identifying particular coaching styles that elicit successful perfor-mance and positive psychological responses from athletes (Horn, 2002). The twomost prominent models of leadership effectiveness in sport, the MultidimensionalModel of Leadership (Chelladurai, 1978) and the Mediational Model of Leader-ship (Smoll & Smith, 1989), have served as frameworks for much of the related re-search. Recently, Horn combined elements of both models to form a workingmodel of coaching effectiveness.

Horn’s (2002) model of coaching effectiveness is founded on at least three as-sumptions. First, both antecedent factors (e.g., coach’s personal characteristics,organizational climate within which one coaches, etc.) and personal characteristicsof the athlete influence a coach’s behavior indirectly through a coach’s expectan-cies, beliefs, and goals. Coaching efficacy is the extent to which a coach believeshe or she has the capacity to affect the learning and performance of his or her ath-letes (Feltz et al., 1999) and is a belief that Horn identifies as affecting coachingbehavior. Second, a coach’s behavior affects both an athlete’s evaluation of thecoach and an athlete’s performance. Third, the effectiveness of various coachinginterventions is influenced by situational factors and individual differences. Muchwork remains to be done in clarifying the specific relations that exist within thesebroad assumptions.

The Coaching Efficacy Scale (CES; Feltz et al., 1999) is the only publishedinstrument purported to measure coaching efficacy. To date, investigations ofthe psychometric properties of the CES (Feltz et al., 1999; Lee, Malete, & Feltz,2002) have focused exclusively on evaluating the fit of the proposed internalmodel (i.e., how the proposed components of coaching efficacy exert influenceon responses to the CES items and how the proposed components of coachingefficacy are related to one another) and have produced fit indexes in confirma-tory factor analytic studies that do not meet generally accepted values (Kline,1998). Because coaching efficacy has assumed a role in models of coaching ef-fectiveness, a more comprehensive evaluation of the psychometric properties ofthe instrument is warranted.

136 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 5: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

EXISTING VALIDITY EVIDENCE

“Validity refers to the degree to which evidence and theory support the interpreta-tionsof test entailedbyproposedusesof tests” (AmericanEducationalResearchAs-sociation [AERA], American Psychological Association [APA], & National Coun-cil on Measurement in Education [NCME], 1999, p. 9). Or, as Messick (1989) morefull explained: “Validity is an integrated evaluative judgment of the degree to whichempirical evidence and theoretical rationales support the adequacy and appropriate-ness of inferences and actions based on test scores or other modes of assessment.”(p. 13). Important initial steps in the validation process include (a) the explication ofa conceptual framework for scores derived from the instrument, (b) explaining howthe construct is operationalized within the instrument, (c) revealing the instrument’sdevelopment process, (d) explaining how the construct is operationalized within theinstrument (i.e., instrumentation), (e) generating an internal model of the construct,and (f) predicting how measures of the construct relate to other variables (i.e., an ex-ternal model). Due to page limitations, we provide a review of only the select aspectsof the conceptual framework of the CES relevant to this study, before explaining theneed for this study and why a multidimensional item response theory (MIRT) modelis especially well-suited to address that need.

Instrument Development

The CES was developed during a 5-week seminar involving 11 coaches who hadvarying levels of experience in coaching and were graduate students in sport psy-chology. The National Standards for Athletic Coaches (National Association forSport and Physical Education, 1995), preliminary work on a coaching efficacy scale(Park,1992)anda reviewof thecoachingeducation literatureprovideda frameworkfor group discussions on the key components of coaching efficacy. Themes thatemerged from the group’s discussions were reduced to teaching technique, imple-menting game strategies, motivating athletes, and developing athletes’ character.

The dimensions of coaching efficacy that emerged from the seminar led to thegeneration of 41 items. Items were written by the participants during the seminarand included the stem: “How confident are you in your ability to.” The rating scaleemployed was a 10-point Likert scale with categories ranging from 0 (not at allconfident) to 9 (extremely confident).1 Nine collegiate and scholastic coaches eval-uated the relevance of the items on a scale that ranged from 1 (essential) to 3 (not

COACHING EFFICACY SCALE 137

1Self-efficacy can be considered to be a situational specific self-confidence (Feltz & Chase, 1998).Thus, in the remainder of this article we use the terms efficacy and confidence interchangeably, exceptwhen considering a particular construct.

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 6: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

essential). Feedback from the content experts led the research team to concludethat all items were potentially important indicators of coaching efficacy; however,17 of the original items were later dropped after considering the results of factoranalyses. Because the decision to drop over 40% of the initial items appeared to beempirically driven, it is possible that the removal of these items decreased the de-gree to which the current scale adequately represents the domain of coaching effi-cacy. (See the Appendix for the retained 24 items.)

Instrumentation

A substantive aspect of how coaching efficacy is operationalized within the CES isthe degree to which athletes employed the rating scale structure in the way the au-thors intended or “systematically.” Previous research on an ordered responseself-efficacy scale (Zhu, Updyke, & Lewandowski, 1997) and long-standing rec-ommendations for measuring attitudes (Likert, 1932) suggest that the given ratingscale structure likely contains too many categories and that collapsing data ap-peared appropriate. Thus, one aspect of the initial validity framework for the CESthat needed to be defined is the optimal rating scale structure. Criteria for deter-mining this are presented in the Method section.

Internal Model

We use the term internal model to allow us to describe three possible depictions ofthe internal structure of the CES. Our subsequent analyses of the internal structureof the instrument are intended to clarify which of these depictions is most consis-tent with the relations among CES items as one piece of validation evidence(AERA, APA, & NCME, 1999).

Self-efficacy judgments do not measure an omnibus trait but rather do-main-specific beliefs that individuals hold about their ability to execute success-fully differing levels of performance given certain situational demands (Bandura,1997). Bandura also noted, however, that a broader sense of efficacy can also beimportant depending on the nature of the domain of interest and the generality ofthe performance variable that one wishes to predict. Consistent with Bandura’stheory, Feltz et al. (1999) put forth both a multidimensional and a unidimensionalconceptualization of coaching efficacy.

The multidimensional model for the CES is illustrated in Figure 1. This modelposits that four specific efficacies (i.e., motivation [ME], game strategy [GS],technique [TE], and character building [CB]) are related to one another and definecoaching efficacy. ME is specified to influence responses to 7 items and is definedas the confidence coaches have in their ability to affect the psychological moodand psychological skills of their athletes. GS is specified to influence responses to7 items and is defined as the confidence coaches have in their abilities to lead dur-

138 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 7: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

ing competition. TE is specified to influence responses to 6 items and is defined asthe belief coaches have in their instructional and diagnostic skills. CB is specifiedto influence responses to 4 items and is defined as the confidence coaches have intheir abilities to influence the personal development and positive attitude towardsport in their athletes. In the unidimensional model, total coaching efficacy (TCE)is specified to influence responses to all 24 items.

In previous studies involving the CES, structural equation models fitting theseinternal models to data from various coaches have produced fit indexes that fallshort of generally accepted values (Feltz et al., 1999; Lee et al., 2002). Feltz et al.subjected data from a heterogeneous sample of high school coaches to confirma-tory factor analysis with maximum likelihood procedures (Jöreskog & Sörbom,1995). Fit indexes for the multidimensional model were χ2 = 790, p < .001,Non-Normal Fit Index (NNFI) = .88, Comparative Fit Index (CFI) = .89, and RootMean Square Error of Approximation (RMSEA) = .08. Lee et al. subjected datafrom Chinese youth coaches to the same type of a confirmatory factor analysis and

COACHING EFFICACY SCALE 139

FIGURE 1 Multidimensional mod-el of the Coaching Efficacy Scale.

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 8: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

reported fit indexes that were similar to those found by Feltz et al. Across bothstudies, coefficient alphas for the scores from the multiple dimensions ranged from.88 to .94.

Feltz et al. (1999) also tested the fit of the unidimensional model. Slightlygreater misfit was observed for this model (χ2 = 844, NNFI = .87, CFI = .88, andRMSEA = .09) as compared to the multidimensional model; however, closely ap-proximating fit of a less complex model typically provides support for adoption ofthe simpler model (Marsh & Hocevar, 1985). Coefficient alpha for TCE was .95.Thus, although the CES has demonstrated good internal consistency, the fit of bothof these internal models has fallen short of generally accepted values. Recommen-dations to address the misfit of the internal models have yet to be forwarded.

Feltz et al. (1999) and Lee et al. (2002) also utilized the multidimensionalmodel to estimate correlations among the four specific efficacies. Disattenuatedcorrelations among subscales ranged from .46 (ME and CB) to .73 (TE and GS)and from .79 (GS and CB) to .95 (TE and GS) in the Feltz et al. and Lee et al. data,respectively.2 Such strong levels of associations among some of the domain-spe-cific efficacies suggest limited discriminant validity among some of the sub-scales—particularly between GS and TE. The fit of a three-dimensional internalmodel, in which TE and GS items are specified to measure a common dimension,has yet to be explored.

Need for This Study

A more comprehensive evaluation of the psychometric properties of the CES isneeded because there is evidence of misfit for the proposed internal models and be-cause important psychometric characteristics of the CES have yet to be examined.Unfortunately, to date, validation studies in general have been rather limited inscope with many focusing only on the internal or external structure (Lee et at.,2002; Malete & Feltz, 2000; Myers, Vargas-Tonsing, & Feltz, 2005; Sullivan &Kent, 2003; Vargas-Tonsing, Warners, & Feltz, 2003) of the obtained measures.An example of a more comprehensive validation study is needed to focus on im-portant characteristics of the measures beyond issues relating to internal and exter-nal structures. Such additional characteristics include the utility of the rating scalestructure (what Messick, 1995, would refer to as the substantive and structural as-pects of validity), the degree of item-level fit (an aspect of content validity), the de-gree of precision in resultant measures (what Messick would refer to as the

140 MYERS, WOLFE, FELTZ

2Explaining why scores from the Feltz et al. (1999) data appeared to be greater than the correspond-ing scores from the Lee et al. (2002) data and why the first-order factors appeared to be more correlatedin the Lee et al. data is difficult given the differences between the two samples in respect to nationality(United States vs. Chinese), level coached (high school vs. youth), and years experience (M = 10.12 vs.M = 5.37).

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 9: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

generalizability aspect of validity). Revealing these unknown characteristics isnecessary to more fully evaluate the degree of validity for the broader internalstructure of the CES for the intended purposes (AERA, APA, & NCME, 1999).Thus, this study extends validity evidence for the CES by examining the followingquestions:

1. Do coaches perceive the rating scale structure in the manner that the au-thors intended?

2. To what degree do various internal models fit the data?3. To what degree do the observed ratings for items fit model-based expected

values?4. How precise are coaching efficacy estimates?5. How reliable are the rank orderings of coaching efficacy estimates?

MIRT Model

Item response theory (IRT) specifies a measurement model that is an alternative totrue score test theory and is well suited to address the research questions in thisstudy. Advantages of IRT include (a) select diagnostic statistics that have provenuseful in determining the optimal categorization of rating scale structures (Zhu &Kang, 1998; Zhu et al., 1997), (b) powerful diagnostic indexes that are available toassess both item- and model-level fit to the data (Tenenbaum & Fogarty, 1998),and (c) conditional standard errors that are routinely estimated and allow the preci-sion of estimates to be explored at different levels of ability (Smith, 2001). Ac-cordingly, IRT has been used to address research questions similar to those posedin this study in areas such as functional movement for infants (Campbell, Wright,& Linacre, 2002), attitudes and behaviors toward studying and learning (Waugh,2003), nutrition self-efficacy (Shulman & Wolfe, 2000), and competencies in ath-letic training (Wolfe & Nogle, 2002).

MIRT can be viewed as an extension of IRT (Reckase, 1997). Conceptually,MIRT models more clearly indicate each of the following compared to IRT mod-els: what dimensions are being measured, how accurately dimensions are mea-sured, and where the instrument may need to be revised when the construct of in-terest is multidimensional (Ackerman, Gierl, & Walker, 2003). Empirically,MIRT models are both technically appropriate and substantively advantageouswhen the data do not meet the assumption of unidimensionality (Briggs & Wilson,2003). Because the developers of the CES conceptualized coaching efficacy as amultidimensional construct and subsequent data have repeatedly violated the as-sumption of unidimensionality, MIRT was deemed the most appropriate frame-work for this study. Specifically, the Multidimensional Random CoefficientsMultinomial Logit (MRCML; Adams, Wilson, & Wang, 1997) model was se-lected. The MRCML model is a multidimensional extension of the Rasch model.

COACHING EFFICACY SCALE 141

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 10: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

Rasch models are a family of one-parameter IRT models that allow for conjointmeasurement of the objects of measurement (e.g., coaches) and the instruments ofmeasurement (e.g., CES items).

METHOD

Sample

Data for this study were collated from all of the published studies with samplesizes greater than 50 that have employed the CES on coaches from United States(Feltz et al., 1999; Malete & Feltz, 2000; Myers et al., 2005).3 When demographicswere available, cases were coded based on level coached, coach’s gender, andcoach’s race. These data, as well as sample size within studies, are illustrated inTable 1. Sport coached and special concerns within data sets were also noted andare summarized. Data sets were then combined across studies (N = 665).

The Feltz et al. (1999) data provided two independent samples. Sample 1coaches represented the sports of basketball (29%), track (13%), volleyball (11%),

142 MYERS, WOLFE, FELTZ

3Data from Chinese coaches (Lee et al., 2002) were excluded because of the focus of this article,concerns with translation of the CES into Chinese, and because of difficulty explaining why those dataproduced first-order scores that appeared to be lower and more correlated than data on coaches from theUnited States of America.

TABLE 1Demographic Information Within and Across Studies

Feltz et al. (1999)Malete & Feltz

(2000)Myers et al.

(2005) Total Sample

N = 188 N = 291a N = 60 N = 126 N = 665

Level coached NA = 24(40%) NA = 24(4%)Youth — — 21(35%) — 21(3%)High school 188(100%) 291(100%) 15(25%) — 494(74%)Collegiate — — — 126(100%) 126(19%)

GenderMale 109 (58%) 163(56%) 34(57%) 84(67%) 227(61%)b

Female 79(42%) 128(44%) 26(43%) 42(33%) 147(39%)b

Race NA = 9(5%) NA = 47(16%) NA = 4(6%) NA = 12(10%) 72(11%)Black — — 10(17%) 4(3%) 14(2%)White 179(95%) 244(84%) 46(77%) 110(87%) 579(87%)

Note. NA = not available.aDemographic data were not provided in the dataset forwarded. Information listed is based on sta-

tistics reported in the relevant manuscript. bDemographic data are based on demographic data attribut-able to specific cases (N = 374).

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 11: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

cross-country (7%), baseball (7%), tennis (7%), and other sports.4 Sample 2 partic-ipants coached basketball (26%), volleyball (13%), track (11%), football (11%),softball (6%), and other sports. The Malete and Feltz (2000) data were collectedwithin a coaching education program and included pre and post measures. Onlypreprogram data were retained in this study to avoid problems with dependency.Participants coached basketball (18%), football (12%), cheerleading (12%), soc-cer (7%), softball (7%), baseball (7%), and other sports. The Myers et al. (2005)data consisted of participants who coached softball (26%), baseball (20%), soccer(34%), and basketball (21%).

Analyses

Data were calibrated to the MRCML model as implemented in ConQuest (Wu,Adams, & Wilson, 1998). The model assumes that a set of D traits underlies acoach’s response to each item. A coach’s (n) position on the D-dimensional latentspace is represented by a vector of latent traits defined according to the internalmodel: θn = [θn1, θn2,…,θnD], where the D dimensions may be nonorthogonal.These vectors can be appended across coaches to create an N × D matrix of posi-tions in the latent space, θ. An item difficulty index, δik, depicts the relative diffi-culty of surpassing threshold k of item i (i.e. , responding with category k ratherthan category k – 1 on the rating scale, where there are K – 1 categories), and itemdifficulties can be appended to create a vector of item difficulties, δ. The intendeddimensional structure of the model is depicted using two matrices composed ofvectors that relate each item to the underlying dimensions. Item scores are mappedto their intended dimensions by specifying a scoring vector for that item bikd =[bik1, bik2,…,bikD] and linear combinations of item difficulties are specified via adesign vector, aik = [i = 1,…,I; k = 1,…, Ki]. Thus, the probability of a response incategory k for item i is modeled as

P X A Bb

b aik

ik

ik ikk

( ; , , | )exp( )

exp( )

= =′

+ ′=

1

1

ξ θθ ξ

θ ξ

+ aik

K 1

∑ (1)

Data were calibrated to the MRCML model via a Monte Carlo implementationof the EM algorithm (Wu et al., 1998). For identification purposes, mean δs wereconstrained to equal zero. Thus, negative δ estimates indicated that an item waseasy to endorse, whereas positive δ estimates indicated that an item was difficult toendorse. Expected a posteriori estimates and standard errors were produced foreach element of the measurement model, with all estimates reported in logistic

COACHING EFFICACY SCALE 143

4Each of the “other” sports comprised ≤ 5% of participants.

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 12: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

odds units (logits). A logit is the natural logarithm of the odds of an event. Becausethe data in this study were polytomous, odds were defined by the likelihood of as-signing a rating in one category versus the odds of assigning a rating in the nextlower category. Diagnostic indexes were selected based on research questions andare described in the sections that follow.

Rating Scale

As an initial evaluation of coaches’ uses of the rating scale, data were calibrated tothe unidimensional Rasch Rating Scale Model (RSM; Wright & Masters, 1982)using Winsteps (Wright & Linacre, 1998). Data were calibrated to the RSM be-cause each item shared a common rating scale structure (i.e., δ ik = δ jk for all items,i and j across the various dimensions). In these analyses, unidimensionality was as-sumed due to a limited number of indicators within subscales (n = 4–7), and be-cause total coaching efficacy was specified in the internal model and TCE scoreswere common in the literature (Feltz et al., 1999; Malete & Feltz, 2000; Myers etal., 2005).

The degree to which respondents perceived the rating scale in the manner thatthe authors intended was evaluated according to guidelines suggested by Linacre(2002). These guidelines can be summarized as: (a) all categories should have atleast 10 observations, (b) distributions of ratings for each category should beunimodal, (c) average measures should increase with the categories, (d) outfitmean square fit statistics should be less than 2.0 for each threshold, (e) categorythresholds should increase with the categories, (f) ratings imply measures (coher-ence > 39%), (g) measures imply ratings (coherence > 39%), (h) category thresh-olds should increase by at least 1.2 logits, and (i) category thresholds should in-crease by no more than 5 logits.

Because a number of Linacre’s guidelines (2002) were not realized in the origi-nal rating scale structure, a post hoc approach was applied to arrive at an improvedrating scale structure. To determine a post hoc structure, categories were collapsedbased on general principles (Linacre, 1995; Wright & Linacre, 1992) and statisti-cal indicators (Zhu et al., 1997). General principles for collapsing categories statethat collapsed categories (a) should be explainable and (b) should balance ob-served frequencies as much as possible. Statistical indicators of improved fit for apost hoc structure should include (a) improved model-data fit statistics, (b) cate-gory and parameter estimates that come closer to satisfying Linacre’s guidelines,and (c) increased separation indexes, as compared to the original rating scalestructure.

Five post hoc categorizations were evaluated in accordance with the previouslymentioned criteria (see Table 2). In each post hoc categorization (e.g., 6–9 CATS),responses that preceded the first listed category (i.e., responses in categories 0–5for 6–9 CATS) were collapsed into that category (i.e., category 6 for 6–9 CATS)

144 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 13: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

145

TABLE 2Rating Scale Analyses for Original and Post Hoc Structures According to Linacre’s Guidelines (2002)

Linacre’s Guideline 0–9 CATS 4–9 CATS 5–9 CATS 6–9 CATS 7–9 CATS 8–9 CATS

(a & b) Observations0 1 (0%) — — — — —1 2 (0%) — — — — —2 12 (0%) — — — — —3 58 (0%) — — — — —4 140 (1%) 213 (1%) — — — —5 562 (4%) 562 (4%) 775 (5%) — — —6 1,619 (10%) 1,619 (10%) 1,619 (10%) 2,370 (15%) — —7 3,938 (25%) 3,938 (25%) 3,938 (25%) 3,938 (25%) 6,140 (39%) —8 5,571 (35%) 5,571 (35%) 5,571 (35%) 5,571 (35%) 5,571 (35%) 7,833 (67%)9 3,899 (25%) 3,899 (25%) 3,899 (25%) 3,899 (25%) 3,899 (25%) 3,899 (33%)

(c) Observed average0 1.68 — — — — —1 0.21 — — — — —2 0.55 — — — — —3 0.49 — — — — —4 0.68 –0.36 — — — —5 0.88 –0.13 –0.63 — — —6 1.11 0.12 –0.32 –1.08 — —7 1.63 0.66 0.25 –0.31 –1.54 —8 2.36 1.38 0.99 0.50 –0.27 –1.799 3.48 2.52 2.15 1.73 1.17 0.45

(continued)

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 14: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

146

TABLE 2 (continued)

Linacre’s Guideline 0–9 CATS 4–9 CATS 5–9 CATS 6–9 CATS 7–9 CATS 8–9 CATS

(d) Outfit MNSQ

0 5.51 — — — — —1 1.39 — — — — —2 1.94 — — — — —3 1.43 — — — — —4 1.23 1.34 — — — —5 1.19 1.26 1.26 — — —6 0.95 1.01 1.04 1.12 — —7 0.96 1.01 1.02 1.06 1.12 —8 0.9 0.88 0.88 0.90 0.90 1.099 1.01 0.98 0.98 0.99 0.96 0.92

(e, h, i) Threshold (SE)0 — — — — — —1 –0.84 (.99) — — — — —2 –1.80 (.58) — — — — —3 –1.41 (.26) — — — — —4 –0.50 (.12) — — — — —5 –0.74 (.07) –1.37 (.07) — — — —6 –0.08 (.04) –1.08 (.04) –1.26 (.04) — — —7 0.51 (.03) –0.47 (.03) –0.90 (.03) –1.19 (.03) — —8 1.63 (.02) 0.66 (.02) 0.27 (.02) –0.25 (.02) –0.79 (.02) —9 3.23 (.02) 2.26 (.02) 1.88 (.02) 1.44 (.02) 0.79 (.02) —

(continued)

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 15: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

147

TABLE 2 (continued)

Linacre’s Guideline 0–9 CATS 4–9 CATS 5–9 CATS 6–9 CATS 7–9 CATS 8–9 CATS

(f) Category –> Measure0 0% — — — — —1 0% — — — — —2 0% — — — — —3 0% — — — — —4 1% 0% — — — —5 10% 9% 4% — — —6 26% 26% 28% 21% — —7 54% 54% 55% 59% 62% —8 63% 63% 63% 63% 69% 88%9 45% 45% 46% 47% 50% 61%

(g) Measure –> Category0 0% — — — — —1 0% — — — — —2 0% — — — — —3 0% — — — — —4 7% 0% — — — —5 34% 36% 64% — — —6 35% 36% 39% 74% — —7 41% 42% 41% 41% 75% —8 50% 50% 50% 50% 49% 82%9 76% 76% 76% 76% 76% 72%

Person reliability 0.92 0.92 0.92 0.92 0.91 0.75Person separation 3.39 3.39 3.42 3.45 3.12 1.73Item separation 8.54 8.59 8.68 8.79 8.59 6.91

Note. CATS = post hoc categorization. MNSQ = unweighted mean square.

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 16: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

because the lower half of the scale was seldom used and because responses to effi-cacy items in athletics typically indicate at least moderate confidence (Feltz &Chase, 1998). Collapsing the said categories assumed that responses from not con-fident at all to moderately confident were not substantively different.Although thisassumption was questionable, liberties taken were likely minimized due to a dearthof responses in the collapsed categories.

Internal Models

A MRCML model approach to confirmatory factor analysis was employed to eval-uate the fit of (a) a unidimensional model (TCE), (b) a three-dimensional model(ME, GS and TE merged, and CB), and (c) a four-dimensional model (ME, GS,TE, and CB) to the observed ratings. Relative fit of each model to the data wasevaluated using a likelihood ratio chi-squared statistic (χ2LR = G2Simple – G2Complex)based on the Consistent Akiake Information Criterion where G2 is the ConsistentAkiake Information Criterion (CAIC) deviance statistic for the model in question.The χ2LR statistic is distributed with degrees of freedom equal to the difference be-tween the number of parameters in the complex and simple models (McCullagh &Nelder, 1990). Because the χ2LR statistic is sensitive to sample size, the CAIC pro-portionality constant (CAICPC) was also considered. The CAICPC {CAICPC =G2/df, where G2 = [–2 × Ln(L)]) + p [1 + Ln(n)]), L = Likelihood of the data giventhe model, p = the number of parameters in the model, and n = the number of obser-vations in the data set } depicts the fit of the model in question relative to the num-ber of parameters that it contains (Wicherts & Dolan, 2004). Disattenuated corre-lations among latent factors were also examined to depict the degree ofredundancy in the multidimensional models.

Item Fit

Unweighted mean square (MNSQ) fit statistics (Wu, 1997) and point-measurecorrelations (rpms) were inspected to evaluate the degree to which items fit thespecifications of the MRCML model. The rpm depicts the correlation between theraw item responses to a particular item and the efficacy estimates of the coacheswho responded to that item. Items with an rpm greater than .30 can be considered todemonstrate an acceptable level of discrimination (Nunnally & Bernstein, 1994).Wu’s MNSQ fit statistic depicts the degree to which the observed ratings are con-sistent with expected values, as derived from the MRCML model in this study.Wu’s index was developed to extend the Wright and Masters (1982) statistic intwo ways: (a) by depicting fit at the parameter level rather than the item-responselevel and (b) generalizing the index for marginal maximum likelihood estimation(rather than the original unconditional maximum likelihood estimation).

148 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 17: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

The MNSQ fit statistic is sensitive to large residuals from pairings of item diffi-culty and efficacy estimates that are far apart on the underlying scale. A fit statisticis reported as a chi-square divided by its degrees of freedom, resulting in an ex-pected value of 1.00 and a range from 0.00 to ∞. Elements with MNSQ fit statisticsfrom 0.80 to 1.40 illustrate adequate fit to the measurement model (Wright &Linacre, 1994). Values less than 0.80 indicate less variability between observedand expected scores than the model predicts and are ignored in this study.

The suitability of these cut-off MNSQ fit values may be questionable in thisstudy because those values were established based on experiences withunidimensional models. These values assume, perhaps incorrectly, that multidi-mensional fit indexes have a null distribution similar to those observed inunidimensional applications.

Coaching Efficacy Precision

Precision refers to the consistency of a coach’s estimated efficacy measure fromone context to another and was depicted with the corresponding conditional stan-dard error. Coaching efficacy precision is difficult to illustrate in the multidimen-sional model because the standard errors do not necessarily follow the same pat-tern as observed in unidimensional models (i.e., smaller at the mean difficulty ofthe instrument and larger in the extremes). This occurs because statistical informa-tion is “borrowed” from other correlated factors in the estimation of a coach’s lo-cation on any single dimension. Thus, for the sake of simplicity of interpretation,only the mean, standard deviation, and range of standard errors within eachsubscale are provided in this article. Standard errors for the unidimensional modelare also depicted using 95% confidence intervals around estimated TCE scores:CI95% (q) = q̂ ± (1.96 * SEq̂).

Reliability

The consistency of rank orderings of efficacy estimates across measurement con-texts was examined with reliability of separation coefficients (Smith, 2001). Thereliability of separation coefficient is analogous to coefficient alpha, but it is basedon estimates of true and error variance derived from the MRCML model. Spe-cifically, the reliability of separation for coaches’ efficacy estimates is Rel = V(q̂) –MSE (q̂)} / V (q), where V (q̂) is the variance of the efficacy estimates andMSE (q̂) is the mean error variance of the coaches’ efficacy estimates. This equa-tion is comparable to the true score test theory definition of reliability as the ratioof true variance to observed variance.

COACHING EFFICACY SCALE 149

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 18: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

RESULTS

Rating Scale

Coaches did not perceive the rating scale structure in the manner that the authorsintended (see Table 2). Specific problems included (a) a paucity of observations inthe zero through four categories (1% of all responses), (b) average efficacy esti-mates advanced unevenly across categories zero through three, (c) the MNSQ fitstatistic was greater than 2.0 for Category 0, (d) thresholds were disordered be-tween Categories 1 and 2 and Categories 4 and 5, (e) coherence values were below40% for Categories 0 through 6, and (f) nearly all of the threshold estimates in-creased by less than 1.2 logits.

A number of the empirical guidelines that were unmet in the original ratingscale structure were improved in the post hoc categorizations. Also, three of thefour empirical guidelines for post hoc categorizations were met in the new struc-tures. First, model-data fit statistics improved as no category produced a MNSQ fitindex greater than 1.34. Second, average efficacy measures increased monotoni-cally in the new categorizations. Third, thresholds were ordered in the new catego-rizations. Last, person item and separation statistics changed little across most ofthe new categorizations.

The structure that collapsed responses from Categories 0 through 5 into Cate-gory 6 was adopted for the remaining analyses. Although some of the more con-densed structures had more evenly distributed frequencies among categories andslightly higher coherences, the four-category structure was preferred because it re-tained more of the original information, produced the greatest separation indexes,and established an interpretable number of categories.

Internal Models

Multidimensional models of the CES exhibited the best fit to the data (see Table3). Deviancediff values indicates that the three-dimensional model fit the databetter than did the unidimensional model, χ2(5) = 1937.39, and that the four-di-mensional model fit the data better than did the three-dimensional model, χ2(4) =240.86. However, because the Deviancediff may not be distributed as a chi-squarewhen the number of observations is less than the number of covariance patterns (N< J), the distribution of the Deviancediff values is unknown (Allison, 1999; Hosmer& Lemeshow, 2000). Thus, we also considered other fit indexes (i.e., CAICPC)when evaluating the relative fit of the various models.

Penalizing for unnecessary parameterization, the three-dimensional model(CAICPC = 124.45) appears to fit the data better than the unidimensional model(CAICPC = 129.14), and the three-dimensional model appears to fit the data atleast as well as the four-dimensional model (CAICPC = 125.54); however, the

150 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 19: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

151

TABLE 3Model-Data Fit Statistics for Latent Trait Confirmatory Factor Analyses

ModelFactor

s Parameters df DFdiff Deviance Deviancediff Ln(n) CAIC CAICdiff DevPC CAICPC

TCE 1 28 272 — 34,916.96 6.50 35,126.96 — 128.37 129.14ME / GS & TE / CB 3 33 267 5 32,979.57 1,937.39 6.50 33,337.07 1,899.89 123.52 124.45ME / GS / TE / CB 4 37 263 4 32,738.71 240.86 6.50 33,016.21 320.86 124.48 125.54

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 20: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

four-dimensional model was retained because this model was most defensible ontheoretical grounds, because evidence existed for the diagnostic value of bothgame strategy and technique efficacies (Feltz et al., 1999; Myers et al., 2005) andbecause the difference between the fit of multidimensional models was small. Al-though the unidimensional model exhibited the poorest fit, psychometric evidencefor this model is also reported in the remaining analyses because TCE scores arecommon in the literature (Feltz et al., 1999) and theoretically justifiable (Bandura,1997).

The retained four-dimensional model had factors that were moderately tohighly correlated with one another, with the strongest association occurring be-tween GS and TE (r = .86) and the weakest association occurring between CB andTE (r = .54). Thus, a fair amount of redundancy was observed among all of thesubscales, but particularly between GS and TE.

Item Fit

The observed values varied from model-based expected values by an amount thatcan be attributed to chance variation for the majority of the items in both the multi-dimensional (96%) and unidimensional models (96%). In each model, residualsfor only one item varied more than was expected. Those items were instill an atti-tude of good moral character–CB5 (MNSQ = 1.45) and demonstrate the skills ofmy sport–TE7 (MNSQ = 1.57) from the multidimensional and unidimensionalmodels, respectively. Both of the identified items were of average difficulty to en-dorse (δCB5 = .09 and δTE7 = –.20). Both the multidimensional and unidimensionalmodel were respecified with the flagged item removed to ensure that none of theretained items exhibited misfit after deleting CB5 and TE7, respectively. In bothcases, none of the retained items exhibited misfit in the respecified model.

The mean point-measure correlations (rpms) and the difficulty estimates (δ) forboth models are summarized in Table 4. A strong (rpms) was observed within eachsubscale (range = .77 to .82) and for TCE (.66). That the (rpms) within eachsubscale was greater than the (rpms) within the unidimensional model supported theretention of a multidimensional model. For identification purposes, mean difficul-ties (δs) were constrained to be zero. Within subscales, the range of δ estimateswas somewhat narrow with none of the scales spanning more than 1.08 logits(CB), and one of the scales spanning only .46 logits (GS). Conversely, the range ofδ estimates within the unidimensional model was twice as broad as was the mostdispersed set of δ estimates within the multidimensional model.

Coaching Efficacy Precision

Multidimensional coaching efficacy estimates (θs) were not very precise for mostcoaches. The degree of imprecision varied as the means and standard deviations of

152 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 21: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

the standard errors ranged from .30 (GS) to .73 (CB) and from .14 (ME) to .41(CB), respectively (see Table 4). Additionally, although all of the subscales hadquite precise θs, for some coaches (minimum standard errors ranged from .01 to.04) maximum standard errors ranged broadly, from 1.02 (ME) to 3.25 (CB). Pre-cision for multidimensional θs varied as a function of the dispersions of item diffi-culties (δs) and θs within each subscale. That is, ME had a dispersion of δs that wassomewhat matched to the corresponding θs and, thus, provided more precise mea-sures as compared to the other subscales. Conversely, CB had the fewest items anda dispersion of δs not at all matched to the corresponding θs and, thus, providedrather imprecise measures as compared to the other subscales.

TCE estimates (θTCEs) were somewhat precise for most coaches because θTCEs

were well matched to item difficulties (δTCEs). As a result, this model produced arelatively small mean standard error (.08), a relatively tight standard deviation ofstandard errors (.04), and a relatively modest maximum standard error (.38). Themost precise measures were for coaches with θTCEs ranging from –.77 to .53.Coaches with low θTCEs (i.e., ≤ –1.99) appeared to be measured more preciselythan were coaches with high θTCEs (i.e., ≥ 2.75). Increased precision for low θTCEs

is explained by a group of easy to endorse items. The easy to endorse items in-cluded all of the character building indicators: CB13 (–.56), CB24 (–.73), CB5(–.76), and CB19 (–1.22).

Reliability

The descriptive statistics of coaching efficacy estimates (θs) and the reliability in-dexes for both models are displayed in Table 4. Reliability coefficients rangedfrom .83 to .91 across the subscales, and the reliability coefficient for TCE was .94.The said coefficients suggest good levels of internal consistency for both multidi-mensional and unidimensional coaching efficacy estimates.

DISCUSSION

An evaluation of the psychometric properties of the CES revealed mixed evidencefor the validity of measures from the instrument. Results offered some support forthe proposed multidimensionality, the fit of observed values to expected values forthe majority of items, the internal consistency of coaching efficacy estimates, andthe precision of TCE estimates. Validity concerns were observed for the ratingscale structure, the precision of multidimensional coaching efficacy estimates, andthe misfit of a couple of items to the measurement model. Results are interpreted toguide future research with the CES, and to provide recommendations for the devel-opment of a revised instrument.

COACHING EFFICACY SCALE 153

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 22: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

154

TABLE 4Descriptive Statistics From the Four-Dimensional and Unidimensional Models

Latent Factor

ME GS TE CB TCE

Mean point-measure correlation (rpm) .80 .79 .77 .82 .66SD of rpms .04 .03 .05 .01 .07Range of rpms .75 to .85 .74 to .83 .69 to .81 .81 to .83 .52 to .76

SD of δs 0.38 0.16 0.42 0.42 0.49Range of δs –.49 to .45 –.28 to .18 –.51 to .50 –.69 to .38 –1.22 to .77

Mean θ(SD) –0.08(1.84) 0.26(1.87) 0.75(1.80) 1.78(2.10) 0.38(1.21)Range of θs –5.24 to 4.54 –5.49 to 4.87 –4.77 to 5.16 –4.86 to 5.91 –3.19 to 3.93

Mean SE of θ(SD) 0.31(.14) 0.30(.17) 0.33(.17) 0.73(.41) 0.08(.04)Range of SEs 0.04 to 1.02 0.01 to 1.58 0.01 to 2.14 0.03 to 3.25 0.06 to 0.38Reliabilityθ 0.91 0.91 0.90 0.83 0.94

Note. ME = motivation efficacy (7 items); GS = game strategy (7 items); TE = technique efficacy (6 items); CB = character building (4 items); TCE = totalcoaching efficacy (24 items).

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 23: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

Analysis of the original rating scale structure indicated that coaches were beingasked to distinguish among too many levels of coaching efficacy. This finding iscongruent with previous findings for the optimal structure of an ordered responseefficacy scale (Zhu et al., 1997) and long-standing recommendations for Likert(1932) scales. Although post hoc analysis identified an improved four-categoryrating scale structure, it is unknown whether the modified scale would prove opti-mal on a cross-validation sample or with coaches of youth sports5; however, thereis reason for confidence in the potential utility of the modified scale asRasch-based optimal categorizations have been confirmed in follow-up applica-tions (Zhu, 2002). Users of the CES and developers of a revised instrument are en-couraged to assess the utility of the proposed four-category structure (i.e., low,moderate, high, and complete confidence). Although low may not be selected fre-quently, such a category is in-line with guidelines for constructing efficacy scales(Bandura, 1997) and would likely attract at least a minimum number (10) of obser-vations necessary for minimal precision of threshold estimates (Linacre, 2002).

Confirmatory factor analyses offered some support for the structural validity ofmeasures from the CES; however, both the fit of the three-dimensional model andthe correlations among the factors in the four-dimensional model suggest limiteddiscriminant validity among subscales, particularly between GS and TE. Becauseevidence exists for the diagnostic value of the subscales in both high school andcollege coaches (Feltz et al., 1999; Myers et al., 2005), refining the definitions ofthe factors and then modifying selected items to lessen the overlap among thesubscales is recommended in a revised version of the instrument. For example, al-tering the definition of TE to focus on beliefs in one’s instructional and diagnosticskills during practice may help to distinguish this belief from confidence in one’sability to lead during competition. Until a revised instrument is available, in mostinstances, the proposed internal model should be utilized to produce multidimen-sional measures of coaching efficacy from the existing items.

The unidimensional model exhibited the poorest fit to the data. Thus, a soundrationale for the use of TCE scores, as opposed to subscale scores, should be pro-vided whenever such a model is employed. Examples of when one may opt for aunidimensional measurement model include instances when coaching efficacy isbut one of a host of variables used to predict a general outcome (e.g., performance)or when subscale scores are highly related and likely to cause problems associatedwith multicollinearity within the specified data analysis. In either case, bivariatecorrelations between the dependant variable(s) and subscale scores should be pro-vided to better explain the nature of the influence of TCE.

Within the multidimensional model, only the observed values for instill an atti-tude of good moral character (CB5) exhibited misfit. Other items that define theCB subscale in the CES target fair play, good sportsmanship, and respect for oth-

COACHING EFFICACY SCALE 155

5Of the participants in this study, 97% coached at the high school or collegiate level.

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 24: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

ers. From a content perspective, moral development, fair play, sportsmanship, andrespect for others may mean different things within sport perspectives of moral de-velopment (Weiss & Smith, 2002). Within the context of the CES and with thesedata, coaches’ responses to CB5 varied in unexpected ways, whereas responses tothe other character-building items varied in ways that were predicted within themeasurement model. Because the content of the identified item may be ambiguousand because of the mentioned empirical problems, users of the CES should con-sider dropping this item. Developers of a revised instrument should consider theliterature on moral development in sport (Shields & Bredemeier, 1995; Weiss &Smith, 2002) in deciding whether to revise or drop CB5.

Despite good fit for most items to the multidimensional model, mismatcheswere observed between the distributions of item difficulties and multidimensionalcoaching efficacy estimates. These mismatches resulted in some imprecision forall multidimensional measures but particularly for CB estimates. Measures for CBwere especially imprecise because CB scores were quite high while the range ofdifficulties for the CB items was restricted.6 Despite some imprecision in the mul-tidimensional coaching efficacy measures, evidence exists for the utility ofsubscale scores (Feltz et al., 1999; Malete & Feltz, 2000; Myers et al., 2005;Sullivan & Kent, 2003; Vargas-Tonsing et al, 2003). That rather imprecise mea-sures produce evidence of external validity probably speaks to the robustness ofthe coaching efficacy model. Although there is no substitute for a good theory,modifications are available to increase the precision of measures. Specifically, de-velopers of a revised instrument should consider adding additional or revising ex-isting items to create wider ranges of difficulties within all of the subscales. Usersof the CES are encouraged to be cognizant of the imprecision in multidimensionalmeasures of coaching efficacy produced by the instrument.

In the unidimensional model, only the observed values for demonstrate theskills of your sport (TE7) exhibited misfit. Exploration of the TE items revealedthat TE7 was the only indicator that required modeling by the coach. Thus, TE7may tap an unintended source of variance (e.g., age, fitness level). Although dem-onstrating skills is a common instructional technique in coaching, one’s ability todo so may not be a relevant source of information that experienced coaches con-sider when rating their instructional and diagnostic abilities. Thus, users of theCES who intend to measure TCE are encouraged to consider the age and fitnesslevel of their population of interest when deciding whether to retain TE7. Becausea clear case can be made for the relevance of TE7 to TE, developers of a revised in-strument are encouraged to consider options for rephrasing TE7 (e.g., walk

156 MYERS, WOLFE, FELTZ

6It should be noted that the problem of imprecision within character building measures is likely tobe exacerbated if future users of the current version of the CES drop the item that was previously identi-fied as exhibiting misfit within the multidimensional model.

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 25: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

through the technical skills of your sport) to lessen the degree to which con-struct-irrelevant variance may be present in responses to the item.

In addition to good fit for most items to the unidimensional model, the distribu-tion of unidimensional item difficulty and TCE estimates were well matched. Thismatch produced estimates that appeared to be much more precise than were multi-dimensional measures of coaching efficacy; however, the unidimensional internalmodel did not fit the data as well as the multidimensional internal model did. Thus,although TCE estimates appear to offer more empirical precision than do multidi-mensional measures of coaching efficacy, what TCE estimates are actually mea-suring is probably less clear conceptually. Still, as previously detailed in the manu-script there are practical, empirical, and theoretical justifications for when TCEmeasures may be appropriate. Users of the CES are encouraged to consider the ad-vantages and weaknesses of both unidimensional and multidimensional coachingefficacy measures, within the context of their specific research questions.

In summary, given the burgeoning role of coaching efficacy in coaching educa-tion research, that the CES is the only instrument purported to measure the con-struct, and the validity evidences forwarded in this study continued use of the CESfor the intended purposes appears reasonable. These purposes include obtainingmeasures to determine sources of coaching efficacy, to examine the influence ofcoaching efficacy on athlete and team variables, and to assess the ability of educa-tion programs to alter coaching efficacy. Users of the CES are encouraged to notethe validity concerns highlighted in this article. Developers of a revised instrumentshould directly address the validity concerns identified in this research study.

ACKNOWLEDGMENTS

This research was supported in part by a William Wohlgamuth Memorial Scholar-ship for the Study of Youth in Sports at Michigan State University.

REFERENCES

Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theoryto evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22,37–53.

Adams, R. J., Wilson, M., & Wang, W. C. (1997). The multidimensional random coefficients multi-nomial logit model. Applied Psychological Measurement, 21, 1–23.

Allison, P. D. (1999). Logistic regression using the SAS system: Theory and application. Cary, NC:SAS Institute.

American Educational Research Association, American Psychological Association, & National Coun-cil on Measurement in Education. (1999). Standards for educational and psychological testing.Washington, DC: American Educational Research Association.

COACHING EFFICACY SCALE 157

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 26: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman.Briggs, D. C., & Wilson, M. (2003). An introduction to multidimensional measurement using Rasch

models. Journal of Applied Measurement, 4, 87–100.Campbell, S. K., Wright, B. D., & Linacre, J. M. (2002). Development of a functional movement scale

for infants. Journal of Applied Measurement, 3, 190–204.Chelladurai, P. (1978). A contingency model of leadership in athletics. Unpublished doctoral disserta-

tion, University of Waterloo, Canada.Feltz, D. L., & Chase, M. A. (1998). The measurement of self-efficacy and confidence in sport. In J. L.

Duda (Ed.), Advancements in sport and exercise psychology measurement (pp. 65–80). Morgan-town, WV: Fitness Information Technology.

Feltz, D. L., Chase, M. A., Moritz, S. E., & Sullivan, P. J. (1999). A conceptual model of coaching effi-cacy: Preliminary investigation and instrument development. Journal of Educational Psychology,91, 765–776.

Horn, T. S. (2002). Coaching effectiveness in the sports domain. In T. S. Horn (Ed.), Advances in sportpsychology (pp. 309–354). Champaign, IL: Human Kinetics.

Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: Wiley.Jöreskog, K., & Sörbom, D. (1995). LISREL 8.14: Structural equation modeling with the SIMPLIS

common language. Chicago: Scientific Software International.Kline, R. B. (1998). Principles and practice of structural equation modeling. New York: Guilford.Lee, K. S., Malete, L., & Feltz, D. L. (2002). The effect of a coaching education program on coaching

efficacy. International Journal of Applied Sport Sciences, 14, 55–67.Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1–55.Linacre, J. M. (1995). Categorical misfit statistics. Rasch Measurement Transactions, 9, 450–451.Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measure-

ment, 3, 85–106.Malete, L., & Feltz, D. L. (2000). The effect of a coaching education program on coaching efficacy. The

Sport Psychologist, 14, 410–417.Marsh, H. W., & Hocevar, D. (1985). The application of confirmatory factor analysis to the study of

self-concept: First and higher order factor structures and their invariance across age groups. Psycho-logical Bulletin, 97, 562–582.

McCullagh, P., & Nelder J. A. (1990). Generalized linear models (2nd ed.). Boca Raton, FL: CRCPress.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). NewYork: Macmillan.

Messick,S. (1995).Validityofpsychologicalassessment:Validationof inferencesfrompersons’responsesand performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.

Myers,N.D.,Vargas-Tonsing,T.M.,&Feltz,D.L. (2005).Coachingefficacy in intercollegiatecoaches:Sources, coaching behavior, and team variables. Psychology of Sport & Exercise, 6, 129–143.

National Association for Sport and Physical Education. (1995). Quality coaches, quality sports: Na-tional standards for athletic coaches. Dubuque, IA: Kendall/Hunt.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.Park, J. K. (1992). Construction of the Coaching Confidence Scale. Unpublished doctoral dissertation.

Michigan State University, East Lansing.Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psycho-

logical Measurement, 21, 25–36.Shields, D. L. L., & Bredemeier, B. J. L (1995). Character development and physical activity. Cham-

paign, IL: Human Kinetics.Shulman, J., & Wolfe, E. W. (2000). The Nutrition Self-Efficacy Scale for Prospective Physicians:

Evaluating reliability and validity through Rasch modeling. Journal of Applied Measurement, 1,107–130.

158 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 27: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

Smith, E. V. (2001). Evidence for the reliability of measures and validity of measure interpretation: ARasch measurement perspective. Journal of Applied Measurement, 2, 281–311.

Smoll, F. L., & Smith, R. E. (1989). Leadership behaviors in sport: A theoretical model and researchparadigm. Journal of Applied Social Psychology, 19, 1522–1551.

Sullivan, P. J., & Kent, A. (2003). Coaching efficacy as a predictor of leadership style in intercollegiateathletics. Journal of Applied Sport Psychology, 15, 1–11.

Tenenbaum, G., & Fogarty, G. (1998). Application of the Rasch analysis to sport and exercise psychol-ogy measurement. In J. L. Duda (Ed.), Advancements in sport and exercise psychology measurement(pp. 409–421). Morgantown, WV: Fitness Information Technology.

Vargas-Tonsing, T. M., Warners, A. L., & Feltz, D. L. (2003). The predictability of coaching efficacyon team efficacy and player efficacy in volleyball. Journal of Sport Behavior, 26, 396–407.

Waugh, R. F. (2003). Measuring attitudes and behaviors to studying and learning for university stu-dents: A Rasch measurement model analysis. Journal of Applied Measurement, 4, 164–180.

Weiss, M. R., & Smith, A. L. (2002). Moral development in sport and physical activity: Theory, re-search, and intervention. In T. S. Horn (Ed.), Advances in sport psychology (pp. 243–280). Cham-paign, IL: Human Kinetics.

Wicherts, J. M., & Dolan, C. V. (2004). A cautionary note on the use of information fit indexes incovariance structure modeling with means. Structural Equation Modeling, 11, 45–50.

Wolfe, E. W., & Nogle, S. (2002). Development of measurability and importance scales for the NATAathletic training educational competencies. Journal of Applied Measurement, 4, 429–452.

Wright, B. D., & Linacre, J. M. (1992). Combining and splitting categories. Rasch MeasurementTransactions, 6, 233–235.

Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Trans-actions, 8, 370.

Wright, B. D., & Linacre, J. M. (1998). WINSTEPS: A Rasch model computer program. Chicago:MESA Press.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago: MESAPress.

Wu, M. L. (1997). The development and application of a fit test for use with marginal maximum likeli-hood estimation and generalized item response models. Unpublished master’s thesis, University ofMelbourne.

Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ACER ConQuest: Generalized item response model-ing software (Version 1.0) [Computer program]. Melbourne, Victoria, Australia: Australian Councilfor Educational Research.

Zhu, W. (2002). A confirmatory study of Rasch-based optimal categorization of a rating scale. Journalof Applied Measurement, 3, 1–15.

Zhu, W., & Kang, S. J. (1998). Cross-cultural stability of the optimal categorization of a self-efficacyscale. A Rasch analysis. Measurement in Physical Education and Exercise Science, 2, 225–241.

Zhu, W., Updyke, W. F., & Lewandowski, C. (1997). Post-hoc Rasch analysis of optimal categoriza-tion of an ordered-response scale. Journal of Outcome Measurement, 1, 286–304.

APPENDIX

How confident are you in your ability to…

1. maintain confidence in your athletes? (ME1)2. recognize opposing team’s strengths during competition? (GS2)

COACHING EFFICACY SCALE 159

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3

Page 28: An Evaluation of the Psychometric Properties of the Coaching Efficacy Scale for Coaches From the United States of America

3. mentally prepare athletes for game/meet strategies? (ME3)4. understand competitive strategies? (GS4)5. instill an attitude of good moral character? (CB5)6. build the self-esteem of your athletes? (ME6)7. demonstrate the skills of your sport? (TE7)8. adapt to different game/meet situations? (GS8)9. recognize opposing team’s weakness during competition? (GS9)

10. motivate your athletes? (ME10)11. make critical decisions during competition? (GS11)12. build team cohesion? (ME12)13. instill an attitude of fair play among your athletes? (CB13)14. coach individual athletes on technique? (TE14)15. build the self-confidence of your athletes? (ME15)16. develop athletes’ abilities? (TE16)17. maximize your team’s strengths during competition? (GS17)18. recognize talent in athletes? (TE18)19. promote good sportsmanship? (CB19)21. adjust your game/meet strategy to fit your team’s talent? (GS21)22. teach the skills of your sport? (TE22)23. build team confidence? (ME23)24. instill an attitude of respect for others? (CB24)

160 MYERS, WOLFE, FELTZ

Dow

nloa

ded

by [

"Que

en's

Uni

vers

ity L

ibra

ries

, Kin

gsto

n"]

at 2

2:48

02

Oct

ober

201

3