validating the conceptions of assessment-iii scale in canadian preservice teachers

This article was downloaded by: [Texas A&M University Libraries]On: 12 November 2014, At: 20:45Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Educational AssessmentPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/heda20

Validating the Conceptions ofAssessment-III Scale in CanadianPreservice TeachersLia M. Danielsa, Cheryl Potha, Chiara Papilea & Marnie Hutchisona

a University of AlbertaPublished online: 13 May 2014.

To cite this article: Lia M. Daniels, Cheryl Poth, Chiara Papile & Marnie Hutchison (2014) Validatingthe Conceptions of Assessment-III Scale in Canadian Preservice Teachers, Educational Assessment,19:2, 139-158, DOI: 10.1080/10627197.2014.903654

To link to this article: http://dx.doi.org/10.1080/10627197.2014.903654

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/heda20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10627197.2014.903654

http://dx.doi.org/10.1080/10627197.2014.903654

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Educational Assessment, 19:139–158, 2014

Copyright © Taylor & Francis Group, LLC

ISSN: 1062-7197 print/1532-6977 online

DOI: 10.1080/10627197.2014.903654

Validating the Conceptions of Assessment–IIIScale in Canadian Preservice Teachers

Lia M. Daniels, Cheryl Poth, Chiara Papile, and Marnie HutchisonUniversity of Alberta

The purpose of this study was to test the validity of the Teachers’ Conceptions of Assessment

Scale III-Abridged Version (CoA-IIIA; Brown, 2006), a measure created, validated, and applied

outside of North America, in a sample of Canadian preservice teachers (n D 436). This work is

important because although we have long known that teachers’ beliefs influence the way they teach,

research also suggests that teachers’ beliefs related to assessment influence the way they assess

students. Confirmatory factor analysis of the Canadian data led to a solution slightly different

than the New Zealand factor structure, with nine latent first-order variables, each with two or

three measured indicators. Although the preservice teachers endorsed the purposes of assessment

similarly to several other countries, the factor measuring assessment as “inaccurate” was endorsed

more strongly than in previous research. Our discussion highlights the differences and similarities

between the current results and previous studies with the CoA-IIIA.

Teachers spend more than one third of their time assessing student learning (Stiggins & Con-

klin, 1992) and enter their teacher education programs with preconceptions about assessment.

It is highly possible that these early conceptualizations of assessment can influence actual

assessment behaviors because behavior is strongly predicted by intention, which is largely

influenced by beliefs or attitudes (Ajzen, 1991). For example, preservice teachers who view

assessment as useful for supporting learning may be more likely to use assessments to enhance

their students’ learning. Alternatively, preservice teachers who believe that assessments ensure

that students complete work may limit their assessment practices assignments involving formal

grades. The extent to which preservice teachers reflect these two examples largely depends on

the educational context they have experienced.

Teachers’ beliefs have been linked to developmentally appropriate practices (Parker &

Neuharth-Pritchett, 2006; K. E. Smith & Croom, 2000), preferences for constructivist or

traditional approaches to teaching and learning (Woolley, Benjamin, & Woolley, 2004), ap-

proaches to large-scale assessment and diagnostic information (Leighton, Gokiert, Cor, &

Correspondence should be sent to Lia M. Daniels, Department of Educational Psychology, Faculty of Education,

University of Alberta, Edmonton, Alberta, Canada, T6G 2G5. E-mail: [email protected]

139

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

140 DANIELS ET AL.

Heffernan, 2010), and intended instructional practices (Papile, Daniels, Poth, & Hutchison,

2012). These highly influential beliefs may be accumulated during teacher education programs,

or may result from years of studenthood. Evidence pointing toward the latter suggests that the

educational context in which teachers were themselves students shapes the way they think about

teaching (Lortie, 1975). Giving structure to this educational context, Brown (2004b) created

the Conceptions of Assessment scale to reflect the sorts of beliefs preservice and practicing

teachers have regarding assessment. The purpose of this study was to examine Brown’s scale in

a sample of Canadian preservice teachers, thus representing its first testing in a North American

context. Specifically, we tested its factor structure, level of endorsement, and mean differences

between genders and teaching levels.

THE CANADIAN CONTEXT OF ASSESSMENT

Much of the landscape of assessment in Canada suggests that Brown’s (2004b) scale may be

a suitable measurement tool. In 1993 the Principles for Fair Student Assessment Practices

for Education in Canada was released and specified that assessment not only measure but

also support students’ learning. The Principles (1993) document addressed both teacher-made

assessments and standardized tests, which represent an established role in Canadian assessment

practices (Klinger, DeLuca, & Miller, 2008). All but one Canadian province requires students

to take standardized tests in a variety of grades and subject areas (Zwaagstra, 2011), and

Canadian adolescents regularly participate in international assessments such as the Programme

for International Student Assessment (PISA; Volante & Ben Jaafar, 2008). From this perspective

Canadian students are no strangers to formal classroom or standardized assessments with the

purpose of measuring progress toward learner outcomes.

Taken a step further, Canadian students may also be aware that assessments can be used

as a reflection of school achievement. For example, the Fraser Institute has emerged as an

independent Canadian public policy research and educational organization that produces School

Report Cards (e.g., Cowley & Easton, 2013). These reports compare the academic performance

of individual schools in Alberta, British Columbia, Ontario, and Quebec for use by teachers,

parents, school administrators, students, and taxpayers. This information influences parents’

choice of school for their children and teachers’ and administrators’ program improvements.

Although Canadian assessment practices appear to focus on summative assessments (Unger-

leider, 2006), accountability in Canada is not solely determined by test performance and is

instead viewed as “the process through which individuals or organizations take responsibility for

their actions and report on these actions to those who are entitled to the information” (Canadian

Teachers’ Federation, 2003, p. 2). This applies equally to elementary school teachers, who are

trained as generalists, and secondary school teachers, who are trained as specialists. Regardless

of their training, however, Canadian teachers generally hold a teaching license for all levels of

mandatory schooling (Grades 1–12). All provinces directly involve teachers in the construction

of standardized tests (Ungerleider, 2003). Moreover, the Canadian Teachers’ Federation (2003)

and many provincial teachers’ associations have turned their attention toward providing teachers

with professional learning opportunities in formative assessments (British Columbia Teacher’s

Federation, 2012; Alberta Teachers’ Association, 2009). In this light, Canada is committed

to using assessment to improve education and support teaching and learning. Indeed current

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

CONCEPTIONS OF ASSESSMENT 141

preservice teachers report having been assessed during their mandatory (K-12) schooling by

comments, peers assessment, rubrics (Lejeune, Poth, & Daniels, 2010), and other nonsummative

methods.

Despite familiarity with both traditional and innovative assessment practices, preservice

teachers feel highly anxious (Daniels, Mandzuk, Perry, & Moore, 2011) and not efficacious in

the area of assessment (Bachor & Baer, 2001; Campbell & Evans, 2000; Volante & Fazio, 2007).

DeLuca and Klinger (2010) reported that in addition to basic assessment tasks like constructing

items, preservice teachers want instruction on how to modify assessments to meet the demands

of inclusive classrooms and how to increase the validity and reliability of assessments. To help

preservice teachers gain these skills, we must first understand the conceptions of assessment

they bring into education programs. Brown’s (2004b) scale represents one measurement tool

to meet this goal.

THE CONCEPTIONS OF ASSESSMENT SCALE III–

ABRIDGED VERSION

The Teachers’ Conceptions of Assessment Scale III–Abridged Version (CoA-IIIA; Brown,

2002, 2004b, 2006) examines four main purposes of educational assessment. The first main pur-

pose is that assessment informs the improvement of education (Brown, 2008). This overarching

purpose is operationalized by four first-order factors each measured by three items: assessment

describes abilities (e.g., determines how much students have learned from teaching), improves

learning (e.g., provides feedback to students about their performance), improves teaching (e.g.,

is integrated with teaching practice), and is valid (e.g., results are trustworthy). These factors

depict assessment as a process that involves both the teacher and student; therefore, to support

this conception, preservice teachers need to see the value of assessment for both students and

teachers.

The second purpose, school accountability, refers to assessment as a representation of how

the school as a whole is performing. It uses assessment results to publicly demonstrate that

schools and by extension teachers are efficiently and effectively using society’s resources

(Brown, 2008). Assessment is viewed as a means of ensuring that schools and teachers not

only deliver quality instruction but also strive to improve the quality of their instruction. In line

with this, assessment results are used to invoke consequences for schools not reaching required

standards. Three items (e.g., assessment provides information on how well schools are doing)

combine to form this single factor.

The third purpose is that assessment holds students accountable for their learning. Teachers

provide grades or scores that are passed on to parents, future employers, and educators

and can have significant educational implications, such as students’ placement into advanced

classes, chances of winning scholarships, or graduation based on performance (Brown, 2008).

Three items around these ideas (e.g., assessment is checking off progress against achievement

objectives) combine to form this single factor.

The final purpose that Brown identified was that assessment can be irrelevant, reflecting

the view that no evaluation process is flawless (Brown, 2011). Taken at its most extreme, the

inherent error in evaluation leads some to claim that it has no legitimate place within the

education system (Brown, 2008). This conception has three first-order factors: assessment is

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

142 DANIELS ET AL.

bad for teachers and students (e.g., forces teachers to teach in a way against their beliefs),

assessment is ignored (e.g., teachers conduct assessments but make little use of the results),

and assessment is inaccurate (e.g., results should be treated cautiously because of measurement

error).

Structure of the CoA-IIIA Across Cultures

The CoA-IIIA was originally validated in New Zealand with practicing teachers (Brown,

2004b, 2006, 2011). Since then, validation evidence has become available for samples from

Queensland, Australia (Brown, Lake, & Matters, 2011), Cyprus (Brown & Michaelides, 2011),

Hong Kong (Brown, Kennedy, Fok, Chan, & Yu, 2009), Spain (Brown & Remesal, 2012),

and the Netherlands (Segers & Tillema, 2011). With the exception of Hong Kong, these

countries reflect an assessment atmosphere in which low-stakes classroom-based assessments

are balanced with standardized examinations to ensure curricular outcomes are met (e.g., Choi,

1999; Crooks, 2002; Marhuenda, 2006; Scheerns, Ehren, Sleegers, & de Leeuw, 2012; Volante

& Ben Jaafar, 2008). In addition, and distinguishing these countries from the United States,

teachers in these countries are not explicitly evaluated on the basis of their students’ test scores

(Rothstein et al., 2010). Furthermore, all these countries except Cyprus participate in PISA.

According to PISA results there are no statistical differences between New Zealand, Australia,

Hong Kong, the Netherlands, and Canada; however, Spain scored below the Organisation for

Economic Co-operation and Development (2010) average. Although these statistics do not speak

to the specific assessment climate in each country, they do provide evidence of some level of

standardized testing and similar performance levels that may be important in understanding the

factor structure of the CoA-IIIA. Next, we highlight some variability and consistency in terms

of structure and strength of endorsement across these studies.

Data from Queensland teachers (Brown et al., 2011) fit the same nine factors as the

original New Zealand model with the addition of two additional paths. These paths highlighted

relationships between the factors measuring improves learning and assessment is inaccurate,

and assessment holds students accountable to the extent that it accurately describes their

learning. Brown and colleagues (2009) hypothesized that Hong Kong teachers would respond

to the inventory in a similar manner as New Zealand and Queensland teachers even though

Hong Kong has a much stronger focus on student accountability. Although all 27 items were

retained, two first-order factors were removed (assessment describes ability and assessment is

inaccurate), and their items were given paths directly to their respective second-order factors.

In addition, for Hong Kong teachers the correlation between the student accountability factor

and the student improvement factor was higher than in any other sample, perhaps reflecting

the strong cultural association between accountability and improvement in Hong Kong. The

CoA-IIIA was also translated into Greek and administered in Cyprus, a nation with low-stakes

assessment during the elementary years similar to New Zealand (Brown & Michaelides, 2011).

The factor structure confirmed in New Zealand did not fit the Cyprus data; thus, two alternative

models were tested with unsuccessful fit. The final model subsumed the four conceptions into

a positive and negative orientation toward assessment and was proposed to be a better specified

and more universal model than the original (Brown et al., 2011).

Two studies relied on exploratory rather than confirmatory factor analysis in producing

their factor structures. First, Segers and Tillema (2011) revealed a four-factor solution with

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


Dutch secondary school teachers. The first factor, which accounted for 19.5% of the variance,

subsumed items related to both formative and summative assessment. The three other factors

were school accountability, bad quality, and good quality. These four factors seem to reflect

the dichotomous nature of the structure that emerged for Cyprus but also separated out

school accountability from student assessment, be it formative or summative. Finally, Brown

and Remesal (2012) divided the original 27 items into five factors reflecting conceptions of

assessment for Spanish preservice teachers. Confirmatory factor analysis was used following

exploratory factor analysis to confirm the following factors: improves student learning and

teaching, is ignored and inaccurate, is bad, measure school quality validly, and assigns grades.

Again, this factor structure seems to divide formative from summative assessment purposes

and then identify other purposes such as school quality and the reality that assessment can be

invalid.

Overall, it seems that some differences in factor structure have been noted and may be

reflections of both current and past contextual factors. For our sample of Canadian preservice

teachers we suggest three contextual factors that may shape their conceptions: The first is that as

preservice teachers they may not yet understand the complex nature of assessment. The second

is the fact that Canadian teachers are not held exclusively accountable for student achievement,

and thus we expected small to moderate positive correlations between the accountability factors

and improvement factors. Third, although Canadian teachers are trained as elementary school

generalists or secondary school specialists, they are licensed to teach all grade levels. As such,

we did not expect differences to emerge between teaching levels.

Level of Endorsement

In terms of strength of endorsement of the factors, teachers from New Zealand, Queensland,

Hong Kong, and Cyprus consistently endorsed the improvement of teaching and learning as

their dominant purpose for assessment (Brown, 2011). When differences between teaching

levels have been found it seems that elementary school teachers were more likely to endorse

assessment for the purpose of improving teaching and learning than secondary school teachers

(Brown et al., 2011). In a different study, secondary teachers appeared to agree more strongly

with assessment for reasons related to student accountability or evaluation than elementary

(Brown, 2011). On the occasion that elementary teachers did favor the idea that assessment

makes students accountable, they also tended to support the notion that assessment is irrelevant

(Brown et al., 2011). This difference may reflect the trend that across the aforementioned

educational contexts, secondary school teachers appear to have more formal or higher stakes

testing imposed on them than do their elementary school counterparts. Segers and Tillema

(2011) did not provide mean scores and Spanish preservice teachers had a very narrow range

in their mean endorsements the scale (Brown & Remesal, 2012). None of the empirical studies

just reviewed tested for gender differences. However, in his dissertation Brown (2002) found

no gender differences in response to the scale.

Building on this body of knowledge we asked three specific research questions:

1. Does the factor structure of the CoA-IIIA fit Canadian preservice teachers? Based on

similarities between the systems just reviewed, we hypothesized few differences in the

factor structure.

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

144 DANIELS ET AL.

2. What is the mean level of endorsement of each conception? Based on their educational

experiences as students and their exposure to a range of assessment methods, we expected

preservice teachers to most strongly endorse items related to improving teaching and

learning.

3. Are there differences by gender or teaching level? Because preservice teachers will be

licensed to teach any level and because standardized testing is common in elementary

and secondary schools across Canada we did not expect any differences between teaching

levels.

METHOD

Participants

During the 1st week of class, the survey was offered to a convenience sample of approximately

700 students enrolled in various sections of a required assessment course at a midwestern

research-intensive Canadian university. A total of 594 students completed the survey. This

sample was further reduced to 436 participants in order to meet the criteria of full data

on all items as is required for the calculation of certain fit indices in AMOS (Arbuckle,

2011).1 Although students may have received instruction in assessment practices during some

of their other coursework, this course represented their first formal instruction related to

assessment practices. The course is situated in students’ Introductory Professional Term, mean-

ing that they were preparing for their first practicum placement in the schools. Fifty-three

percent of the sample was enrolled in the secondary school teacher program, 76% of the

sample was female, and 44.8% of students reported being born in 1988 or 1989 making

them roughly 20–22 years old at the time the data was collected (range of years 1960 to

1990).

Measures

Aside from the demographic information just presented, the main questionnaire in this study

was the 27-item Teachers’ Conceptions of Assessment–III Abridged Version (Brown, 2006). We

chose the Teacher scale rather than the Student scale because we were interested in how these

soon-to-be teachers thought about assessment in a professional capacity rather than as students

themselves. This was particularly relevant in the context of their required assessment course,

which challenged them to consider assessment practices as emerging professionals rather than as

students. Preservice teachers indicated the extent to which they agreed with statements related to

four assessment purposes (improves education, students accountable, schools accountable, and

irrelevant) on a positively packed rating scale with two negative responses and four positive

responses. This scale was chosen by Brown (2004a) in an effort to maximize the range of

responses to items that participants are inclined to rate highly.

1According to independent sample t tests, the 436 participants with full data did not differ significantly from the

158 participants who did not have full data on any measure of conceptions of assessment (t values ranged from /.2/ to

/1.8/). Visual inspection of the data suggested that there was no systematic reason for the incomplete data, and thus

we chose to exclude the 158 from the main analyses in order to obtain the standardized root mean square residual.

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


Procedure

The link to the survey, which was approved by the university ethics review board, was posted on

the course web page so that students could complete the survey on their own time. Students who

chose to click the link were redirected to SurveyMonkey. To ensure that students understood the

research had no relation to their course performance or grades, a research assistant monitored

all data collection and data were not released to the instructor/researcher until after the course

was completed. Although participation was not anonymous, no identifying information was

collected aside from a code that the students created for purposes of connecting their data

to follow-up surveys, should they be so inclined. The survey consisted of several existing

questionnaires and researcher-created items intended to measure preservice teachers’ beliefs,

knowledge, skills, and behaviors related to assessment (e.g., Elliot & Murayama, 2008; Midgely

et al., 2000). We examined the measures of conceptions of assessment only as part of the current

validation study.

Rationale for Analyses

Models were estimated in AMOS version 20.0 (Arbuckle, 2011) using maximum likelihood

estimation. The first step in the analyses was to test the theorized structure set out by Brown in

a confirmatory factor analysis and then, if required by poor fit, test a series of alternative models

that may provide a better fit to the Canadian data. A model was determined to have adequate

fit when the following indices were met: comparative fit index (CFI) > .90, Tucker–Lewis

index (TLI) > .86, root mean square error of approximation < .06, standardized root mean

square residual (SRMR) < .08 (Hu & Bentler, 1999), which is comparable to the fit statistics

for the abridged version originally reported by Brown (2006), �2(111) D 841.02, root mean

square error of approximation D .057, Tucker–Lewis index D .87. Once a factor structure was

decided upon, descriptive statistics, scale reliability, and correlations were calculated in SPSS

20.0. Finally, we tested for mean differences on each scale between gender and teaching level

using independent sample t tests.2

RESULTS

Confirmatory Factor Analysis

We tested six models in total. Only the fifth model demonstrated adequate fit and was thus

retained and further described (see Figure 1 for a comparison of the various models).

Description of the competing models. Model 1 was a direct replication of the factor

structure confirmed by Brown (2006). As previously described, this model had two higher order

latent constructs with three and four first-order constructs. Examinations of global fit indices

2We tested for latent mean differences between male and female participants and elementary and secondary

preservice teachers by comparing an unconstrained model to one in which the factor loadings were constrained to be

equal between the two groups in AMOS 20.0. Neither of the models provided an admissible solution possibly because

the sample size for each group became too small (Jöreskog & Sörbom, 1984).

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

146 DANIELS ET AL.

(Model 1)

(Model 2)

FIGURE 1 Schematic path models of Teachers’ Conceptions of Assessment–III Abridged Version tested in

the Canadian sample (Model 1); Replication of original New Zealand model (Model 2); Higher order only New

Zealand model (Model 3); First order only New Zealand model (Model 4); Positive/Negative Cyprus model

(Model 5); First order model select items deleted (Model 6) Modified structure. (continued )

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


(Model 3)

(Model 4)

FIGURE 1 (continued )

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

148 DANIELS ET AL.

(Model 5)

FIGURE 1 (continued )

and residuals found the model to be inadmissible, thus theoretically derived modifications to

the model were pursued. Model fit statistics for all models are reported in Table 1.

Following Brown and Michaelides (2011), Models 2 and 3 were variants of Model 1.

Specifically, Model 2 removed the first-order factors that loaded onto the two higher order

factors and instead allowed the respective nine and 12 measured items to load directly on a

first-order factor equivalent to the original second-order factor. As such, the 27 items continued

to represent four main purposes: assessment is irrelevant, assessment improves education,

assessment makes schools accountable, and assessment makes students accountable. None of

the statistics were in the acceptable range, indicating that our data does not have optimal fit with

this model. Model 3 tested the fit of the nine first-order factors each with three items loading

directly to the latent variable and no second-order factors. As such, the 27 items represented

the original nine first-order factors. This model provided goodness-of-fit values comparable

to Brown’s (2006) fit; however, because more stringent fit indices are commonly expected

and because additional theoretically based models remained to be tested, we pursued three

additional models. In Model 4 we divided the items into two higher order factors representing

positive conceptions (four items, improves teaching; eight items supports student learning;

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


(Model 6)

FIGURE 1 (Continued)

seven items schools accountable) and negative conceptions (five items bad; two items ignored).

The model did not fit.

Returning to Model 3, which approached an acceptable fit, we investigated individual factor

loadings and the wording of specific items. Four items were deemed problematic empirically

and/or conceptually. First we examined Item 27 “assessment is an imprecise process” on the

TABLE 1

Goodness-of-Fit Statistics for Six Competing Models

Model �2/df p CFI TLI RMSEA SRMR

1 3.25 <.001 .803 .778 .072 .0887

2 3.70 <.001 .759 .734 .079 .0886

3 2.79 <.001 .855 .824 .064 .0777

4 3.82 <.001 .788 .763 .081 .0899

5 2.40 <.001 .906 .879 .057 .0540

6 2.68 <.001 .896 .872 .062 .0611

Note. CFI D comparative fit index; TLI D Tucker–Lewis index; RMSEA D root mean square error of

approximation; SRMR D standardized root mean square residual.

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

150 DANIELS ET AL.

invalid factor (factor loading D .30). We chose to delete this item because of its poor loading

and because conceptually it may present an uncomfortable notion to students whose academic

careers have been advanced by the assessment process. Second we examined Item 4 “assessment

places students into categories” on the student accountability factor (factor loading D .33). We

chose to delete this item as well because it does not reflect current assessment practices in

Canadian schools where the focus is on inclusive education. Third, Item 5 “determines if

students have met qualification standards,” also from the student accountability factor had a

less than desirable loading of .45. However, to retain the latent factor of student accountability

in this analysis we chose to retain this item. Fourth, Item 9 “assessment measures students’

higher order thinking skills” on the describes ability factor was considered (factor loading D

.50). We chose to delete this item because it was conceptually different from the other two

items in the factor by making specific reference to some type of ability level rather than a

generalized indication of learning. The result of these modifications was an adequately fitting

fifth model with nine latent first-order factors, three of which had only two measured indicators.

Because models in which any latent variable consists of only two measured indicators are

subject to an array of problems including specification error (Kline, 2005) and because internal

consistencies may be low for variables measured by few items (Nunnally, 1978), we tested

one additional model. In Model 6, the latent variable inaccurate was completely removed and

the remaining two items on each of the students accountable and describes ability factors

were moved onto a single latent factor, which we labeled assessment is reporting achievement,

because of the traditional emphasis on summative assessment for those items (for similar

restructuring see Brown & Remesal, 2012). The resulting model included seven latent first-

order variables, each with at least three measured indicators; however, the goodness of fit was

not acceptable. Thus, we retained Model 5, which is presented in full in Table 2.

Additional Analyses

None of the identified factors had good internal reliability (Table 3). In fact, according to

Nunnally (1978), our range of reliabilities from .60 to .73 is on the lower end of acceptable or

may even be considered questionable. Because Cronbach’s alpha is (a) related to the number

of items (Nunnally, 1978) and (b) the lowest limit of true scores (Sijtsma, 2009), and (c)

because of the exploratory nature of this research, we suggest these reliabilities are acceptable

at this time. There is, however, one exception: The two-item factor student accountability had

an alpha reliability of .44. In Model 6 these two items were combined with the two items from

the describes ability factor to create a single variable with four measured items and a reliability

of .67. Model 6, however, did not meet the necessarily goodness-of-fit indices to be retained.

Thus, the low measures of internal reliability documented in these results foreshadows the

need to either generate additional items for these first-order factors or test additional models

in which factors are combined.

The rank order of endorsement of each conception is listed in Table 3. Preservice teachers

agreed most strongly with the factor measuring assessment improves learning. Surprisingly and

contrary to existing research, participants next strongly rated the factor measuring assessment is

inaccurate. After this, students returned to endorsing positive assessment factors more strongly

than the negative factors, with improve teaching and describes ability as third and fourth,

respectively. Preservice teachers least strongly agreed with the factors describing assessment

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


TABLE 2

Item Wording and Standardized Factor Loadings for Final Model (No. 5)

Item

No. Factor Item Wording

Factor

Loading

1 School accountability provides information on how well schools are doing .59

2 School accountability is an accurate indicator of a school’s quality .75

3 School accountability is a good way to evaluate a school .78

5 Student accountability is assigning a grade or level to student work .41

6 Student accountability determines if students meet qualification standards .69

7 Describes ability is a way to determine how much students have learned from teaching .78

8 Describes ability establishes what students have learned .73

10 Improves learning provides feedback to students about their performance .66

11 Improves learning feeds back to students their learning needs .69

12 Improves learning helps students improve their learning .68

13 Improves teaching is integrated with teaching practice .65

14 Improves teaching information modifies ongoing teaching of students .50

15 Improves teaching allows different students to get different instruction .70

16 Is valid results are trustworthy .82

17 Is valid results are consistent .63

18 Is valid results can be depended on .62

19 Is bad forces teachers to teach in a way against their beliefs .53

20 Is bad is unfair to students .68

21 Is bad interferes with teaching .75

22 Is ignored conduct assessments but make little use of the results .48

23 Is ignored results are filed away and ignored .65

24 Is ignored has little impact on teaching .61

25 Is inaccurate results should be treated cautiously given measurement error .99

26 Is inaccurate should take into account the error and imprecision in all assessment .44

TABLE 3

Descriptive Statistics and Correlations for Each Variable

Variable

N

Items ˛ M SD Rank 1 2 3 4 5 6 7 8

1. Schools account 3 .73 8.83 2.53 7

2. Students account 2 .44 7.70 1.65 5 .39*

3. Describes ability 2 .73 8.03 1.92 4 .46* .41*

4. Improves learning 3 .71 13.39 2.38 1 .22* .25* .61*

5. Improves teaching 3 .63 12.28 2.51 3 .26* .25* .58* .64*

6. Is valid 3 .71 9.20 2.46 6 .53* .37* .49* .41* .41*

7. Is bad 3 .68 6.91 2.21 8 .07 .03 �.26* �.35* �.27* �.08

8. Is ignored 3 .60 6.59 2.07 9 .18* .00 �.19* �.31* �.28* .04 .56*

9. Is inaccurate 2 .61 8.32 1.97 2 .05 .04 .05 .07 .12 �.09 .16* .09

*p < .01.

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

152 DANIELS ET AL.

TABLE 4

Mean Differences Between Genders and Teaching Level (t Tests)

Variable

Female

M (SD)

Male

M (SD) t(430)

Elementary

M (SD)

Secondary

M (SD) t(429)

1. Schools account 8.91 (2.55) 8.51 (2.41) 1.43 9.04 (2.74) 8.66 (2.33) 1.54

2. Students account 7.70 (1.61) 7.67 (1.74) .14 7.71 (1.55) 7.70 (1.70) �.04

3. Describes ability 8.08 (2.01) 7.86 (1.60) .99 8.24 (1.96) 7.85 (1.87) 2.12

4. Improves learning 13.41 (2.43) 13.30 (2.23) .39 13.44 (2.43) 13.34 (2.34) .42

5. Improves teaching 12.36 (2.57) 12.02 (2.35) 1.21 12.53 (2.54) 12.07 (2.50) 1.88

6. Is valid 9.08 (2.42) 9.51 (2.54) �1.58 9.24 (2.32) 8.16 (2.57) .35

7. Is bad 6.97 (2.22) 6.76 (2.17) .85 6.93 (2.18) 6.90 (2.22) .14

8. Is ignored 6.56 (2.06) 6.66 (2.09) �.42 6.55 (2.17) 6.61 (1.96) �.30

9. Is inaccurate 8.39 (1.99) 8.17 (1.91) 1.01 8.48 (1.95) 8.20 (1.99) 1.48

N 329 301 — 200 231 —

as bad or ignored, thus supporting much of the previous research (Brown, 2011). These scores

were below the midpoint of the scale.

The correlations between variables supported these mean distinctions: Preservice teachers

who strongly endorsed the positive assessment factors of improving learning and improving

teaching tended to disagree with the negative assessment factors of assessment is bad and

assessment is ignored (rs range D �.19 to �.35, ps < .01). Of interest, this was not the case

for either of the two accountability factors, which had only one significant relationship with

negative factors, and that was in a positive direction (schools accountable and assessment is

ignored; r D .18, p < .01). The highest correlations emerged between the describes ability

factor and the improves learning factor, and between improves learning and improves teaching,

both of which exceeded .60.

Finally, there were no significant differences between male and female participants or

elementary and secondary students on any of the factors (Table 4). These results suggest that at

this point men and women and future elementary and secondary school teachers conceptualize

assessment in similar ways.

DISCUSSION

Three findings emerged that highlight the differences and similarities between the Canadian

sample and previous studies with the CoA-IIIA. First, Canadian preservice teachers’ concep-

tions of assessment aligned moderately with the factor structure originally determined in New

Zealand. In particular, we discuss the finding that no higher order factors were identified.

Second, a slight variation in level of endorsement emerged, such that this sample of Canadian

preservice teachers endorsed the factor measuring assessment as inaccurate more strongly than

some of the positive conceptions of assessment. Third, no differences emerged between men

and women or elementary and secondary preservice teachers.

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


Factor Structure

Brown’s original model suggested that conceptions of assessment are best represented as four

second-order factors with nine contributing first-order factors. Second-order models suggest

that some number of distinct but related constructs can be subsumed under one higher order

construct, resulting in a more parsimonious model. Aside from sacrificing some parsimony,

what does it imply that the best fitting model for this group of Canadian preservice teachers

could not be conceptualized in terms of higher order factors? Why does it seem that Canadian

preservice teachers conceptualize the purposes of assessment relatively discretely? One possible

explanation is that because this sample consisted of preservice rather than practicing teachers

they have not yet begun to view assessment as a broad and multifaceted concept. Instead,

they are focused on each potential purpose as a discrete function of assessment. In other

professions such as medicine and law (e.g., Eva, 2004; Mitchell, 1989), educational researchers

have identified that experts often make decisions using nonanalytical approaches (e.g., pattern

recognition, experiential knowledge), whereas novices do not have the cumulative experiences

to use this type of reasoning and approach tasks analytically. The same may be true for

preservice teachers’ conceptions of assessment. Indeed, Brown and Remesal (2012) produced

a first-order only model to fit their samples of preservice teachers. Although Brown (2002)

compared preservice and practicing teachers, more studies are needed in this area and should be

paired with an examination of how expertise in assessment practices is established over time.

The correlations between factors largely reflect the Canadian assessment climate. Unlike

Hong Kong, where student accountability is almost synonymous with improvement, our data

revealed a more moderate relationship between these factors. Specifically, the factors measuring

student accountability, improves learning, and improves teaching correlated significantly with r

values between .25 and .41. The school accountability factor had similar correlations, perhaps

suggesting that Canadian preservice teachers consider both student and school accountability

important in terms of improving education.

Levels of Endorsement

Canadian preservice teachers endorsed most conceptions at a similar level as samples from

other countries (Brown, 2011). In general, improvement factors (i.e., improves learning and

improves teaching) were rated most highly, and negative factors (i.e., assessment is bad and

assessment is ignored) were rated lowest. One exception emerged: Canadian preservice teachers

had relatively high levels of endorsement of the assessment is inaccurate factor. One reason

for this might be that these preservice teachers, who are enrolled in a mandatory assessment

course, are learning about the types of inaccuracies and errors that exist in assessments and thus

may be hypersensitive to these issues. Indeed, DeLuca and Klinger (2010) found that many

preservice teachers desired more instruction about the validity and reliability of assessment

practices. A second explanation is that perhaps other samples also responded strongly to this

subscale but because it is subsumed into the higher order factor of irrelevance in other research

the higher responses may be masked. In fact, almost all studies that involved the identification

of higher order factors provided no information on the characteristics of the first-order factors

(e.g., Brown et al., 2011; for an exception, see Brown & Michaelides, 2011). A third option is

that because preservice teachers are still students themselves they may experience assessment as

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

154 DANIELS ET AL.

inaccurate. For example, imagine a preservice teacher who is confident in his or her teaching

abilities but scores poorly on an exam in a curriculum course. This student may interpret

the assessment as inaccurate because it does not match his or her beliefs. In this way the

idea that assessment is inaccurate may reflect a mechanism by which preservice teachers can

protect their self-worth (Covington, 2000). Finally, this level of endorsement might reflect a true

concern about assessment practices in Canada. Although some inaccuracies are always possible,

Alberta has two major mechanisms in place to minimize this risk. First, Alberta Education

(http://education.alberta.ca/home.aspx) takes great care to link its standardized achievement

tests to the provincial program of studies, involve practicing teachers in exam creation and

scoring, and provide students with sample exam questions. In fact, Canadian teachers develop

and grade all criterion-referenced standardized exams (Volante, 2006). Second, the Alberta

Assessment Consortium (http://www.aac.ab.ca/) is an independent and not-for-profit agency that

advocates for sound assessment practices and creates a wide range of classroom assessments

for teachers in addition to providing professional development opportunities.

It is also interesting to note that this factor was unrelated to all other conceptions, except a

small positive correlation with the assessment is bad factor. Taken together, these results suggest

that although preservice teachers indeed believe that assessment is inaccurate, this inaccuracy

does not appear to systematically influence any other conceptions. This notion requires further

research because inaccuracies should negatively relate to all other conceptions. For example,

if assessment is inaccurate, how can it be valid, describe ability, or improve learning? Indeed,

other research has had to modify the model to better acknowledge that assessment is only able

to improve learning or hold students accountable to the extent that it is not inaccurate (Brown

et al., 2011).

Gender and Teaching Level

Respondents did not differ systematically based on gender or level of teacher training (i.e.,

elementary or secondary). Recently, Brown (2011) compared primary and secondary teachers’

conceptions of assessment and found his original model to be statistically invariant with good

fit across both groups. Like Brown (2011), our data suggest that at the outset of their training,

Canadian preservice teachers are relatively similar in their conceptions of assessment. Again,

this may be because at this point their conceptions may be based on a shared student experience

and have not been tailored to the assessment realities associated with either elementary or

secondary school teaching. We may expect to see some differences with practicing teachers,

but this is a question for future research.

Limitations

Three limitations should be kept in mind. First, although Brown (2011) called for research

examining the CoA-IIIA in high-stakes assessment jurisdictions such as the United States,

Canada may not meet this call. The province of Alberta in which the data were collected has

more standardized testing than any other Canadian province, but this rate is still much less than

the United States, thus generalizability to the rest of Canada or North America is tentative.

This leads to the second limitation, the homogeneity of our sample. In developing the survey

Brown tested the impact of a wide range of individual difference variables including years of

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


training, years of experience, and role in school on responses to the scale—none of which

emerged as significant (Brown, 2004b). Because our sample consisted of preservice teachers

some of these variables are not yet relevant; however, we do provide further evidence that

differences among gender and teaching level do not appear to make a difference even at the

preservice level. In the same vein, like other researchers (e.g., Segers & Tillema, 2011) we

used the teachers’ version of the Conceptions of Assessment inventory rather than the student

version. Although this was a conscious decision, it allows for some ambiguity in our results

because the sample could have answered the questions as students. Third, it is possible that

responses to these items may have been influenced by other questionnaires on the larger survey

that were not analyzed in this study.

Implications and Directions for Future Research in Preservice Education

Having validated the conceptions of assessment scale with Canadian preservice teachers, we are

poised to look at ways in which teachers’ conceptions of assessment influence their intended

practice. One option is to follow these preservice teachers into practice to determine whether

their preservice conceptions influence their practices. Specifically, are preservice teachers who

view assessment as a tool for improvement able to withstand the pressures of standardized

testing and remain committed to assessment for this purpose? Another area for future research

is to examine how conceptions of assessment change for preservice teachers after receiving

instruction in classroom assessment practices. This is a crucial shift from student to teacher

and may refine their conceptions and move them toward a better understanding of complex

possibilities of assessment.

Preservice teachers’ assessment conceptions are likely based on years of studenthood (Lortie,

1975), so a process is necessary to move their conceptions closer to current views of assessment.

Teacher education has a central role in this process, but there is little consistency in the

ways that assessment education is delivered across Canadian teacher education programs

(Russell, McPherson, & Martin, 2001). At a minimum, teacher education programs should

help preservice teachers become aware of their conceptions of assessment and how these

might influence assessment decisions they make as practicing teachers. After helping preservice

teachers become aware of their preconceptions, a course-based intervention could be designed

to help them conceive of assessment in ways that match current educational policies. Such

intervention could be implemented in education programs, thereby offering the best chance

at successful reform (Brown, 2004b; Brown & Harris, 2009). Although research consistently

shows it is difficult to change teachers’ beliefs (Pajares, 1992), C. D. Smith, Worsfold, Davies,

Fisher, and McPhail (2013) recently undertook such an endeavor and showed that indeed

assessment literacy can be improved through a relatively brief in-class intervention. Thus,

more research is needed in this area and can begin now that a measurement tool has been

validated.

FUNDING

This work was supported by two grants awarded to the first and second authors: a Social

Sciences and Humanities Research Council of Canada Standard Grant (410-2011-0095) and a

University of Alberta Teaching and Learning Enhancement Fund Grant (RES0004915).

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

156 DANIELS ET AL.

REFERENCES

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–

211.

Alberta Teachers’ Association. (2009). Real learning first: The teaching profession’s view of student assessment, evalu-

ation and accountability for the 21st century. Issues in Education, 7, 1–34. Retrieved from http://www.teachers.ab.ca

Arbuckle, J. L. (2011). IBM® SPSS® Amos™ 20 user’s guide. Retrieved from ftp://129.35.224.12/software/analytics/

spss/documentation/amos/20.0/en/Manuals/IBM_SPSS_Amos_User_Guide.pdf

Bachor, D. G., & Baer, M. R. (2001). An examination of preservice teachers’ simulated classroom assessment practices.

Alberta Journal of Educational Research, 47, 244–258. doi:10.1787/888932343342

British Columbia Teachers’ Federation. (2012). Better schools for BC: A plan for quality public education. Retrieved

from http://www.betterschools.bc.ca

Brown, G. T. L. (2002). Teachers’ conceptions of assessment (Doctoral dissertation, University of Auckland, Auckland,

New Zealand). Retrieved from http://hdl.handle.net/2292/63

Brown, G. T. L. (2004a). Measuring attitude with positively packed self-report ratings: Comparison of agreement and

frequency scales. Psychological Reports, 94, 1015–1024.

Brown, G. T. L. (2004b). Teachers’ conceptions of assessment: Implications for policy and professional development.

Assessment in Education, 11, 301–318. doi:10.1080/0969594042000304609

Brown, G. T. L. (2006). Teachers’ conceptions of assessment: Validation of an abridged version. Psychological Reports,

99, 166–170. doi:10.2466/PRO.99.1.166-170

Brown, G. T. L. (2008). Conceptions of assessment: Understanding what assessment means to teachers and students.

New York, NY: Nova Science.

Brown, G. T. L. (2011). Teachers’ conceptions of assessment: Comparing primary and secondary teachers in New

Zealand. Assessment Matters, 3, 45–70.

Brown, G. T. L., & Harris, L. R. (2009). Unintended consequencesof using tests to improve learning: How improvement-

oriented resources heighten conceptions of assessment as school accountability. Journal of MultiDisciplinary Eval-

uation, 6(12), 68–91.

Brown, G. T. L., Kennedy, K. J., Fok, P. K., Chan, J. K. S., & Yu, W. M. (2009). Assessment for student improvement:

Understanding Hong Kong teachers’ conceptions and practices of assessment, Assessment in Education: Principles,

Policy & Practice, 16, 347–363. doi:10.1080/09695940903319737

Brown, G. T. L., Lake, R., & Matters, G. (2011). Queensland teachers’ conceptions of assessment: The impact of

policy priorities on teacher attitudes. Teaching and Teacher Education, 27, 210–220. doi:10.1016/j.tate.2010.08.003

Brown, G. T. L., & Michaelides, M. P. (2011). Ecological rationality in teachers’ conceptions of assessment across

samples from Cyprus and New Zealand. European Journal of Psychology of Education, 26, 319–337. doi:10.1007/

s10212-010-0052-3

Brown, G. T. L., & Remesal, A. (2012). Prospective teachers’ conceptions of assessment: A cross-cultural comparison.

Spanish Journal of Psychology, 15, 75–89. doi:10.5209/rev_SJOP.2012.v15.n1.37286

Campbell, C., & Evans, J. A. (2000). Investigation of preservice teachers’classroom assessment practices during student

teaching. The Journal of Educational Research, 93, 350–355. doi:10.1080/00220670009598729

Canadian Teachers’ Federation. (2003). Moving from the cult of testing to a culture of professional accountability.

Perspectives, 3, 1–9.

Choi, C. (1999). Public examinations in Hong Kong. Assessment in Education: Principles, Policies, and Practice, 6,

405–417.

Covington, M. V. (2000). Goal theory, motivation, and school achievement: An integrative review. Annual Review of

Psychology, 51, 171–200. doi:0084–6570/00/0171–0200

Cowley, P. & Easton, S. (2013, February). Studies in education policy: Report card on Alberta’s elementary schools

2013. Retrieved from http://alberta.compareschoolrankings.org

Crooks, T. J. (2002). Educational assessment in New Zealand schools. Assessment in Education: Principles, Policy, &

Practice, 9, 237–253. doi:10.1080/0969594022000001959

Daniels, L. M., Mandzuk, D., Perry, R. P., & Moore, C. (2011). The impact of teacher candidates’ perceptions of their

initial teacher education program on teaching anxiety, efficacy, and commitment. Alberta Journal of Educational

Research, 57, 88–106.

DeLuca, C., & Klinger, D. A. (2010). Assessment literacy development: Identifying gaps in teacher candidates’ learning.

Assessment in Education: Principles, Policy & Practice, 17, 419–438. doi:10.1080/0969594X.2010.516643

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014


Elliot, A. J., & Murayama, K. (2008). On the measurements of achievement goals: Critique, illustration, and application.

Journal of Educational Psychology, 100, 613–628. doi:10.1037/0022-0663.100.3.613

Eva, K. W. (2004). What every teacher needs to know about clinical reasoning. Medical Education, 39, 98–106.

doi:10.1111/j.1365-2929.2004.01972.x

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria

versus new alternative. Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118

Joreskog, K. G., & Sorbom, D. (1984). LISREL VI: Analysis of Linear Structural Relationships by the Method of

Maximum Likelihood. Chicago, IL: National Educational Resources.

Kline, R. B. (2005). Principles and practices of structural equation modeling (2nd ed.). New York, NY: Guilford.

Klinger, D., DeLuca, C., & Miller, T. (2008). The evolving culture of large-scale assessments in Canadian education.

Canadian Journal of Educational Administration and Policy, 76, 1–34.

Leighton, J. P., Gokiert, R. J., Cor, M. K., & Heffernan, C. (2010). Teacher beliefs about the cognitive diagnostic

information of classroom versus large-scale tests: Implications for assessment literacy. Assessment in Education:

Principles, Policy & Practice, 17, 7–21. doi:10.1080/09695940903565362

Lejeune, A., Poth, C., & Daniels, L. M. (2010, May). Examining pre-service teachers’ knowledge, skills, attributes,

experiences, and goals related to formative and summative classroom assessment. Paper presented at the Canadian

Society for the Study of Education, Montreal, Canada.

Lortie, D. (1975). Schoolteacher: A sociological study. Chicago, IL: University of Chicago Press.

Marhuenda, F. (2006). Assessment in the Spanish educational system. Assessment in Education: Principles, Policy, &

Practice, 4, 413–429. doi:10.1080/0969594970040307

Midgley, C., Maehr, M., Hruda, L., Anderman, E., Amerman, L., Freeman, K., : : : Urdan, T. (2000). Manual for the

Patterns of Adaptive Learning Scales. Ann Arbor: University of Michigan.

Mitchell, J. B. (1989). Current theories on expert and novice thinking: A full faculty considers the implications for

legal education. Journal of Legal Education, 39, 275–289.

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.

Organisation for Economic Co-operation and Development. (2010). PISA 2009 results: Executive summary. Retrieved

from http://www.oecd.org/pisa/pisaproducts/46619703.pdf

Pajares, M. F. (1992). Teachers’ beliefs and educational research: Cleaning up a messy construct. Review of Educational

Research, 62, 307–332.

Papile, C., Daniels, L. M., Poth, C., & Hutchison, M. A. (2012, April). Mastery and performance: Giving structure

to instructional and assessment practices. Poster presented at the annual meeting of the Western Psychological

Association, San Francisco, CA.

Parker, A., & Neuharth-Pritchett, S. (2006). Developmentally appropriate practice in kindergarten: Factors shaping

teacher beliefs and practice. Journal of Research in Childhood Education, 21, 65–78. doi:10.1080/02568540609

594579

Principles for Fair Student Assessment Practices for Education in Canada. (1993). Edmonton, Alberta: Joint Advisory

Committee. (Available from Joint Advisory Committee, Centre for Research in Applied Measurement and Evaluation,

3–104 Education Building North, University of Alberta, Edmonton, Alberta, T6G 2G5)

Rothstein, R., Ladd, H. F., Ravitch, D., Bakers, E. L., Barton, P. E., Darling-Hammond, L., : : : Sheppard, L. A. (2010).

Problems with the use of student test scores to evaluate teachers (Economic Policy Institute Briefing Paper #278).

Retrieved from http://www.epi.org/publication/bp278/

Russell, T., McPherson, S., & Martin, A. K. (2001). Coherence and collaboration in teacher education reform. Canadian

Journal of Education, 26, 37–55. Retrieved from http://www.jstor.org/stable/1602144

Scheerns, J., Ehren, M., Sleegers, P., & de Leeuw, R. (2012). OECD Review on Evaluation and Assessment Frameworks

for Improving School Outcomes: Country Background Report for the Netherlands. Retrieved from www.oecd.org/

edu/school/NLD_CBR_Evaluation_and_Assessment.pdf

Segers, M., & Tillema, H. (2011). How do Dutch secondary teachers and students conceive the purpose of assessment?

Studies in Educational Evaluation, 37, 49–54. doi:10.1016/j.stueduc.2011.03.008

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74,

107–120. doi:10.1007/s11336-008-9101-0.

Smith, C. D., Worsfold, K., Davie, L., Fisher, R., & McPhail, R. (2013). Assessment literacy and student learning:

The case for explicitly developing students’ ‘assessment literacy.’ Assessment and Evaluation in Higher Education,

38, 44–60. doi:10.1080/02602938.2011.598636

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

158 DANIELS ET AL.

Smith, K. E., & Croom, L. (2000). Multidimensional self-concepts of children and teacher beliefs about developmentally

appropriate practices. Journal of Educational Research, 93, 312–321. Retrieved from http://www.jstor.org/stable/

27542281

Stiggins, R. J., & Conklin, N. F. (1992). In teachers’ hands: Investigating the practices of classroom assessment.

Albany: SUNY Press.

Ungerleider, C. (2003). Large-scale student assessment: Guidelines for policymakers. Interactional Journal of Testing,

3, 119–128. doi:10.1207/S15327574IJT0302_2

Ungerleider, C. (2006). Government, neo-liberal media, and education in Canada. Canadian Journal of Education, 29,

70–90. doi:10.2307/20054147

Volante, L. (2006). An alternative vision for large-scale assessment in Canada. Journal of Teaching and Learning, 4,

1–14.

Volante, L., & Ben Jaafar, S. (2008). Educational assessment in Canada. Assessment in Education, Principles, Policy,

& Practice, 15, 201–210. doi:10.1080/09695940802164226

Volante, L., & Fazio, X. (2007). Exploring teacher candidates’ assessment literacy: Implications for teacher education

reform and professional development. Canadian Journal of Education, 30, 749–770. doi:10.2307/20466661

Woolley, S. L., Benjamin, W. J., & Woolley A. W. (2004). Approaches to teaching and learning construct validity of

a self-report measure of teacher beliefs related to constructivist and traditional approaches to teaching and learning.

Educational and Psychological Measurement, 64, 319–331. doi:10.1177/0013164403261189

Zwaagstra, M. (2011). Standardized testing is a good thing. Policy Series: Frontier Centre for Public Policy, 119,

1–15. Retrieved from http://www.fcpp.org/files/1/PS119StandardizedTesting.pdf

Dow

nloa

ded

by [

Tex

as A

&M

Uni

vers

ity L

ibra

ries

] at

20:

45 1

2 N

ovem

ber

2014

validating the conceptions of assessment-iii scale in canadian preservice teachers

Documents