validating the conceptions of assessment-iii scale in canadian preservice teachers
TRANSCRIPT
This article was downloaded by: [Texas A&M University Libraries]On: 12 November 2014, At: 20:45Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Educational AssessmentPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/heda20
Validating the Conceptions ofAssessment-III Scale in CanadianPreservice TeachersLia M. Danielsa, Cheryl Potha, Chiara Papilea & Marnie Hutchisona
a University of AlbertaPublished online: 13 May 2014.
To cite this article: Lia M. Daniels, Cheryl Poth, Chiara Papile & Marnie Hutchison (2014) Validatingthe Conceptions of Assessment-III Scale in Canadian Preservice Teachers, Educational Assessment,19:2, 139-158, DOI: 10.1080/10627197.2014.903654
To link to this article: http://dx.doi.org/10.1080/10627197.2014.903654
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Educational Assessment, 19:139–158, 2014
Copyright © Taylor & Francis Group, LLC
ISSN: 1062-7197 print/1532-6977 online
DOI: 10.1080/10627197.2014.903654
Validating the Conceptions of Assessment–IIIScale in Canadian Preservice Teachers
Lia M. Daniels, Cheryl Poth, Chiara Papile, and Marnie HutchisonUniversity of Alberta
The purpose of this study was to test the validity of the Teachers’ Conceptions of Assessment
Scale III-Abridged Version (CoA-IIIA; Brown, 2006), a measure created, validated, and applied
outside of North America, in a sample of Canadian preservice teachers (n D 436). This work is
important because although we have long known that teachers’ beliefs influence the way they teach,
research also suggests that teachers’ beliefs related to assessment influence the way they assess
students. Confirmatory factor analysis of the Canadian data led to a solution slightly different
than the New Zealand factor structure, with nine latent first-order variables, each with two or
three measured indicators. Although the preservice teachers endorsed the purposes of assessment
similarly to several other countries, the factor measuring assessment as “inaccurate” was endorsed
more strongly than in previous research. Our discussion highlights the differences and similarities
between the current results and previous studies with the CoA-IIIA.
Teachers spend more than one third of their time assessing student learning (Stiggins & Con-
klin, 1992) and enter their teacher education programs with preconceptions about assessment.
It is highly possible that these early conceptualizations of assessment can influence actual
assessment behaviors because behavior is strongly predicted by intention, which is largely
influenced by beliefs or attitudes (Ajzen, 1991). For example, preservice teachers who view
assessment as useful for supporting learning may be more likely to use assessments to enhance
their students’ learning. Alternatively, preservice teachers who believe that assessments ensure
that students complete work may limit their assessment practices assignments involving formal
grades. The extent to which preservice teachers reflect these two examples largely depends on
the educational context they have experienced.
Teachers’ beliefs have been linked to developmentally appropriate practices (Parker &
Neuharth-Pritchett, 2006; K. E. Smith & Croom, 2000), preferences for constructivist or
traditional approaches to teaching and learning (Woolley, Benjamin, & Woolley, 2004), ap-
proaches to large-scale assessment and diagnostic information (Leighton, Gokiert, Cor, &
Correspondence should be sent to Lia M. Daniels, Department of Educational Psychology, Faculty of Education,
University of Alberta, Edmonton, Alberta, Canada, T6G 2G5. E-mail: [email protected]
139
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
140 DANIELS ET AL.
Heffernan, 2010), and intended instructional practices (Papile, Daniels, Poth, & Hutchison,
2012). These highly influential beliefs may be accumulated during teacher education programs,
or may result from years of studenthood. Evidence pointing toward the latter suggests that the
educational context in which teachers were themselves students shapes the way they think about
teaching (Lortie, 1975). Giving structure to this educational context, Brown (2004b) created
the Conceptions of Assessment scale to reflect the sorts of beliefs preservice and practicing
teachers have regarding assessment. The purpose of this study was to examine Brown’s scale in
a sample of Canadian preservice teachers, thus representing its first testing in a North American
context. Specifically, we tested its factor structure, level of endorsement, and mean differences
between genders and teaching levels.
THE CANADIAN CONTEXT OF ASSESSMENT
Much of the landscape of assessment in Canada suggests that Brown’s (2004b) scale may be
a suitable measurement tool. In 1993 the Principles for Fair Student Assessment Practices
for Education in Canada was released and specified that assessment not only measure but
also support students’ learning. The Principles (1993) document addressed both teacher-made
assessments and standardized tests, which represent an established role in Canadian assessment
practices (Klinger, DeLuca, & Miller, 2008). All but one Canadian province requires students
to take standardized tests in a variety of grades and subject areas (Zwaagstra, 2011), and
Canadian adolescents regularly participate in international assessments such as the Programme
for International Student Assessment (PISA; Volante & Ben Jaafar, 2008). From this perspective
Canadian students are no strangers to formal classroom or standardized assessments with the
purpose of measuring progress toward learner outcomes.
Taken a step further, Canadian students may also be aware that assessments can be used
as a reflection of school achievement. For example, the Fraser Institute has emerged as an
independent Canadian public policy research and educational organization that produces School
Report Cards (e.g., Cowley & Easton, 2013). These reports compare the academic performance
of individual schools in Alberta, British Columbia, Ontario, and Quebec for use by teachers,
parents, school administrators, students, and taxpayers. This information influences parents’
choice of school for their children and teachers’ and administrators’ program improvements.
Although Canadian assessment practices appear to focus on summative assessments (Unger-
leider, 2006), accountability in Canada is not solely determined by test performance and is
instead viewed as “the process through which individuals or organizations take responsibility for
their actions and report on these actions to those who are entitled to the information” (Canadian
Teachers’ Federation, 2003, p. 2). This applies equally to elementary school teachers, who are
trained as generalists, and secondary school teachers, who are trained as specialists. Regardless
of their training, however, Canadian teachers generally hold a teaching license for all levels of
mandatory schooling (Grades 1–12). All provinces directly involve teachers in the construction
of standardized tests (Ungerleider, 2003). Moreover, the Canadian Teachers’ Federation (2003)
and many provincial teachers’ associations have turned their attention toward providing teachers
with professional learning opportunities in formative assessments (British Columbia Teacher’s
Federation, 2012; Alberta Teachers’ Association, 2009). In this light, Canada is committed
to using assessment to improve education and support teaching and learning. Indeed current
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 141
preservice teachers report having been assessed during their mandatory (K-12) schooling by
comments, peers assessment, rubrics (Lejeune, Poth, & Daniels, 2010), and other nonsummative
methods.
Despite familiarity with both traditional and innovative assessment practices, preservice
teachers feel highly anxious (Daniels, Mandzuk, Perry, & Moore, 2011) and not efficacious in
the area of assessment (Bachor & Baer, 2001; Campbell & Evans, 2000; Volante & Fazio, 2007).
DeLuca and Klinger (2010) reported that in addition to basic assessment tasks like constructing
items, preservice teachers want instruction on how to modify assessments to meet the demands
of inclusive classrooms and how to increase the validity and reliability of assessments. To help
preservice teachers gain these skills, we must first understand the conceptions of assessment
they bring into education programs. Brown’s (2004b) scale represents one measurement tool
to meet this goal.
THE CONCEPTIONS OF ASSESSMENT SCALE III–
ABRIDGED VERSION
The Teachers’ Conceptions of Assessment Scale III–Abridged Version (CoA-IIIA; Brown,
2002, 2004b, 2006) examines four main purposes of educational assessment. The first main pur-
pose is that assessment informs the improvement of education (Brown, 2008). This overarching
purpose is operationalized by four first-order factors each measured by three items: assessment
describes abilities (e.g., determines how much students have learned from teaching), improves
learning (e.g., provides feedback to students about their performance), improves teaching (e.g.,
is integrated with teaching practice), and is valid (e.g., results are trustworthy). These factors
depict assessment as a process that involves both the teacher and student; therefore, to support
this conception, preservice teachers need to see the value of assessment for both students and
teachers.
The second purpose, school accountability, refers to assessment as a representation of how
the school as a whole is performing. It uses assessment results to publicly demonstrate that
schools and by extension teachers are efficiently and effectively using society’s resources
(Brown, 2008). Assessment is viewed as a means of ensuring that schools and teachers not
only deliver quality instruction but also strive to improve the quality of their instruction. In line
with this, assessment results are used to invoke consequences for schools not reaching required
standards. Three items (e.g., assessment provides information on how well schools are doing)
combine to form this single factor.
The third purpose is that assessment holds students accountable for their learning. Teachers
provide grades or scores that are passed on to parents, future employers, and educators
and can have significant educational implications, such as students’ placement into advanced
classes, chances of winning scholarships, or graduation based on performance (Brown, 2008).
Three items around these ideas (e.g., assessment is checking off progress against achievement
objectives) combine to form this single factor.
The final purpose that Brown identified was that assessment can be irrelevant, reflecting
the view that no evaluation process is flawless (Brown, 2011). Taken at its most extreme, the
inherent error in evaluation leads some to claim that it has no legitimate place within the
education system (Brown, 2008). This conception has three first-order factors: assessment is
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
142 DANIELS ET AL.
bad for teachers and students (e.g., forces teachers to teach in a way against their beliefs),
assessment is ignored (e.g., teachers conduct assessments but make little use of the results),
and assessment is inaccurate (e.g., results should be treated cautiously because of measurement
error).
Structure of the CoA-IIIA Across Cultures
The CoA-IIIA was originally validated in New Zealand with practicing teachers (Brown,
2004b, 2006, 2011). Since then, validation evidence has become available for samples from
Queensland, Australia (Brown, Lake, & Matters, 2011), Cyprus (Brown & Michaelides, 2011),
Hong Kong (Brown, Kennedy, Fok, Chan, & Yu, 2009), Spain (Brown & Remesal, 2012),
and the Netherlands (Segers & Tillema, 2011). With the exception of Hong Kong, these
countries reflect an assessment atmosphere in which low-stakes classroom-based assessments
are balanced with standardized examinations to ensure curricular outcomes are met (e.g., Choi,
1999; Crooks, 2002; Marhuenda, 2006; Scheerns, Ehren, Sleegers, & de Leeuw, 2012; Volante
& Ben Jaafar, 2008). In addition, and distinguishing these countries from the United States,
teachers in these countries are not explicitly evaluated on the basis of their students’ test scores
(Rothstein et al., 2010). Furthermore, all these countries except Cyprus participate in PISA.
According to PISA results there are no statistical differences between New Zealand, Australia,
Hong Kong, the Netherlands, and Canada; however, Spain scored below the Organisation for
Economic Co-operation and Development (2010) average. Although these statistics do not speak
to the specific assessment climate in each country, they do provide evidence of some level of
standardized testing and similar performance levels that may be important in understanding the
factor structure of the CoA-IIIA. Next, we highlight some variability and consistency in terms
of structure and strength of endorsement across these studies.
Data from Queensland teachers (Brown et al., 2011) fit the same nine factors as the
original New Zealand model with the addition of two additional paths. These paths highlighted
relationships between the factors measuring improves learning and assessment is inaccurate,
and assessment holds students accountable to the extent that it accurately describes their
learning. Brown and colleagues (2009) hypothesized that Hong Kong teachers would respond
to the inventory in a similar manner as New Zealand and Queensland teachers even though
Hong Kong has a much stronger focus on student accountability. Although all 27 items were
retained, two first-order factors were removed (assessment describes ability and assessment is
inaccurate), and their items were given paths directly to their respective second-order factors.
In addition, for Hong Kong teachers the correlation between the student accountability factor
and the student improvement factor was higher than in any other sample, perhaps reflecting
the strong cultural association between accountability and improvement in Hong Kong. The
CoA-IIIA was also translated into Greek and administered in Cyprus, a nation with low-stakes
assessment during the elementary years similar to New Zealand (Brown & Michaelides, 2011).
The factor structure confirmed in New Zealand did not fit the Cyprus data; thus, two alternative
models were tested with unsuccessful fit. The final model subsumed the four conceptions into
a positive and negative orientation toward assessment and was proposed to be a better specified
and more universal model than the original (Brown et al., 2011).
Two studies relied on exploratory rather than confirmatory factor analysis in producing
their factor structures. First, Segers and Tillema (2011) revealed a four-factor solution with
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 143
Dutch secondary school teachers. The first factor, which accounted for 19.5% of the variance,
subsumed items related to both formative and summative assessment. The three other factors
were school accountability, bad quality, and good quality. These four factors seem to reflect
the dichotomous nature of the structure that emerged for Cyprus but also separated out
school accountability from student assessment, be it formative or summative. Finally, Brown
and Remesal (2012) divided the original 27 items into five factors reflecting conceptions of
assessment for Spanish preservice teachers. Confirmatory factor analysis was used following
exploratory factor analysis to confirm the following factors: improves student learning and
teaching, is ignored and inaccurate, is bad, measure school quality validly, and assigns grades.
Again, this factor structure seems to divide formative from summative assessment purposes
and then identify other purposes such as school quality and the reality that assessment can be
invalid.
Overall, it seems that some differences in factor structure have been noted and may be
reflections of both current and past contextual factors. For our sample of Canadian preservice
teachers we suggest three contextual factors that may shape their conceptions: The first is that as
preservice teachers they may not yet understand the complex nature of assessment. The second
is the fact that Canadian teachers are not held exclusively accountable for student achievement,
and thus we expected small to moderate positive correlations between the accountability factors
and improvement factors. Third, although Canadian teachers are trained as elementary school
generalists or secondary school specialists, they are licensed to teach all grade levels. As such,
we did not expect differences to emerge between teaching levels.
Level of Endorsement
In terms of strength of endorsement of the factors, teachers from New Zealand, Queensland,
Hong Kong, and Cyprus consistently endorsed the improvement of teaching and learning as
their dominant purpose for assessment (Brown, 2011). When differences between teaching
levels have been found it seems that elementary school teachers were more likely to endorse
assessment for the purpose of improving teaching and learning than secondary school teachers
(Brown et al., 2011). In a different study, secondary teachers appeared to agree more strongly
with assessment for reasons related to student accountability or evaluation than elementary
(Brown, 2011). On the occasion that elementary teachers did favor the idea that assessment
makes students accountable, they also tended to support the notion that assessment is irrelevant
(Brown et al., 2011). This difference may reflect the trend that across the aforementioned
educational contexts, secondary school teachers appear to have more formal or higher stakes
testing imposed on them than do their elementary school counterparts. Segers and Tillema
(2011) did not provide mean scores and Spanish preservice teachers had a very narrow range
in their mean endorsements the scale (Brown & Remesal, 2012). None of the empirical studies
just reviewed tested for gender differences. However, in his dissertation Brown (2002) found
no gender differences in response to the scale.
Building on this body of knowledge we asked three specific research questions:
1. Does the factor structure of the CoA-IIIA fit Canadian preservice teachers? Based on
similarities between the systems just reviewed, we hypothesized few differences in the
factor structure.
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
144 DANIELS ET AL.
2. What is the mean level of endorsement of each conception? Based on their educational
experiences as students and their exposure to a range of assessment methods, we expected
preservice teachers to most strongly endorse items related to improving teaching and
learning.
3. Are there differences by gender or teaching level? Because preservice teachers will be
licensed to teach any level and because standardized testing is common in elementary
and secondary schools across Canada we did not expect any differences between teaching
levels.
METHOD
Participants
During the 1st week of class, the survey was offered to a convenience sample of approximately
700 students enrolled in various sections of a required assessment course at a midwestern
research-intensive Canadian university. A total of 594 students completed the survey. This
sample was further reduced to 436 participants in order to meet the criteria of full data
on all items as is required for the calculation of certain fit indices in AMOS (Arbuckle,
2011).1 Although students may have received instruction in assessment practices during some
of their other coursework, this course represented their first formal instruction related to
assessment practices. The course is situated in students’ Introductory Professional Term, mean-
ing that they were preparing for their first practicum placement in the schools. Fifty-three
percent of the sample was enrolled in the secondary school teacher program, 76% of the
sample was female, and 44.8% of students reported being born in 1988 or 1989 making
them roughly 20–22 years old at the time the data was collected (range of years 1960 to
1990).
Measures
Aside from the demographic information just presented, the main questionnaire in this study
was the 27-item Teachers’ Conceptions of Assessment–III Abridged Version (Brown, 2006). We
chose the Teacher scale rather than the Student scale because we were interested in how these
soon-to-be teachers thought about assessment in a professional capacity rather than as students
themselves. This was particularly relevant in the context of their required assessment course,
which challenged them to consider assessment practices as emerging professionals rather than as
students. Preservice teachers indicated the extent to which they agreed with statements related to
four assessment purposes (improves education, students accountable, schools accountable, and
irrelevant) on a positively packed rating scale with two negative responses and four positive
responses. This scale was chosen by Brown (2004a) in an effort to maximize the range of
responses to items that participants are inclined to rate highly.
1According to independent sample t tests, the 436 participants with full data did not differ significantly from the
158 participants who did not have full data on any measure of conceptions of assessment (t values ranged from /.2/ to
/1.8/). Visual inspection of the data suggested that there was no systematic reason for the incomplete data, and thus
we chose to exclude the 158 from the main analyses in order to obtain the standardized root mean square residual.
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 145
Procedure
The link to the survey, which was approved by the university ethics review board, was posted on
the course web page so that students could complete the survey on their own time. Students who
chose to click the link were redirected to SurveyMonkey. To ensure that students understood the
research had no relation to their course performance or grades, a research assistant monitored
all data collection and data were not released to the instructor/researcher until after the course
was completed. Although participation was not anonymous, no identifying information was
collected aside from a code that the students created for purposes of connecting their data
to follow-up surveys, should they be so inclined. The survey consisted of several existing
questionnaires and researcher-created items intended to measure preservice teachers’ beliefs,
knowledge, skills, and behaviors related to assessment (e.g., Elliot & Murayama, 2008; Midgely
et al., 2000). We examined the measures of conceptions of assessment only as part of the current
validation study.
Rationale for Analyses
Models were estimated in AMOS version 20.0 (Arbuckle, 2011) using maximum likelihood
estimation. The first step in the analyses was to test the theorized structure set out by Brown in
a confirmatory factor analysis and then, if required by poor fit, test a series of alternative models
that may provide a better fit to the Canadian data. A model was determined to have adequate
fit when the following indices were met: comparative fit index (CFI) > .90, Tucker–Lewis
index (TLI) > .86, root mean square error of approximation < .06, standardized root mean
square residual (SRMR) < .08 (Hu & Bentler, 1999), which is comparable to the fit statistics
for the abridged version originally reported by Brown (2006), �2(111) D 841.02, root mean
square error of approximation D .057, Tucker–Lewis index D .87. Once a factor structure was
decided upon, descriptive statistics, scale reliability, and correlations were calculated in SPSS
20.0. Finally, we tested for mean differences on each scale between gender and teaching level
using independent sample t tests.2
RESULTS
Confirmatory Factor Analysis
We tested six models in total. Only the fifth model demonstrated adequate fit and was thus
retained and further described (see Figure 1 for a comparison of the various models).
Description of the competing models. Model 1 was a direct replication of the factor
structure confirmed by Brown (2006). As previously described, this model had two higher order
latent constructs with three and four first-order constructs. Examinations of global fit indices
2We tested for latent mean differences between male and female participants and elementary and secondary
preservice teachers by comparing an unconstrained model to one in which the factor loadings were constrained to be
equal between the two groups in AMOS 20.0. Neither of the models provided an admissible solution possibly because
the sample size for each group became too small (Jöreskog & Sörbom, 1984).
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
146 DANIELS ET AL.
(Model 1)
(Model 2)
FIGURE 1 Schematic path models of Teachers’ Conceptions of Assessment–III Abridged Version tested in
the Canadian sample (Model 1); Replication of original New Zealand model (Model 2); Higher order only New
Zealand model (Model 3); First order only New Zealand model (Model 4); Positive/Negative Cyprus model
(Model 5); First order model select items deleted (Model 6) Modified structure. (continued )
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 147
(Model 3)
(Model 4)
FIGURE 1 (continued )
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
148 DANIELS ET AL.
(Model 5)
FIGURE 1 (continued )
and residuals found the model to be inadmissible, thus theoretically derived modifications to
the model were pursued. Model fit statistics for all models are reported in Table 1.
Following Brown and Michaelides (2011), Models 2 and 3 were variants of Model 1.
Specifically, Model 2 removed the first-order factors that loaded onto the two higher order
factors and instead allowed the respective nine and 12 measured items to load directly on a
first-order factor equivalent to the original second-order factor. As such, the 27 items continued
to represent four main purposes: assessment is irrelevant, assessment improves education,
assessment makes schools accountable, and assessment makes students accountable. None of
the statistics were in the acceptable range, indicating that our data does not have optimal fit with
this model. Model 3 tested the fit of the nine first-order factors each with three items loading
directly to the latent variable and no second-order factors. As such, the 27 items represented
the original nine first-order factors. This model provided goodness-of-fit values comparable
to Brown’s (2006) fit; however, because more stringent fit indices are commonly expected
and because additional theoretically based models remained to be tested, we pursued three
additional models. In Model 4 we divided the items into two higher order factors representing
positive conceptions (four items, improves teaching; eight items supports student learning;
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 149
(Model 6)
FIGURE 1 (Continued)
seven items schools accountable) and negative conceptions (five items bad; two items ignored).
The model did not fit.
Returning to Model 3, which approached an acceptable fit, we investigated individual factor
loadings and the wording of specific items. Four items were deemed problematic empirically
and/or conceptually. First we examined Item 27 “assessment is an imprecise process” on the
TABLE 1
Goodness-of-Fit Statistics for Six Competing Models
Model �2/df p CFI TLI RMSEA SRMR
1 3.25 <.001 .803 .778 .072 .0887
2 3.70 <.001 .759 .734 .079 .0886
3 2.79 <.001 .855 .824 .064 .0777
4 3.82 <.001 .788 .763 .081 .0899
5 2.40 <.001 .906 .879 .057 .0540
6 2.68 <.001 .896 .872 .062 .0611
Note. CFI D comparative fit index; TLI D Tucker–Lewis index; RMSEA D root mean square error of
approximation; SRMR D standardized root mean square residual.
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
150 DANIELS ET AL.
invalid factor (factor loading D .30). We chose to delete this item because of its poor loading
and because conceptually it may present an uncomfortable notion to students whose academic
careers have been advanced by the assessment process. Second we examined Item 4 “assessment
places students into categories” on the student accountability factor (factor loading D .33). We
chose to delete this item as well because it does not reflect current assessment practices in
Canadian schools where the focus is on inclusive education. Third, Item 5 “determines if
students have met qualification standards,” also from the student accountability factor had a
less than desirable loading of .45. However, to retain the latent factor of student accountability
in this analysis we chose to retain this item. Fourth, Item 9 “assessment measures students’
higher order thinking skills” on the describes ability factor was considered (factor loading D
.50). We chose to delete this item because it was conceptually different from the other two
items in the factor by making specific reference to some type of ability level rather than a
generalized indication of learning. The result of these modifications was an adequately fitting
fifth model with nine latent first-order factors, three of which had only two measured indicators.
Because models in which any latent variable consists of only two measured indicators are
subject to an array of problems including specification error (Kline, 2005) and because internal
consistencies may be low for variables measured by few items (Nunnally, 1978), we tested
one additional model. In Model 6, the latent variable inaccurate was completely removed and
the remaining two items on each of the students accountable and describes ability factors
were moved onto a single latent factor, which we labeled assessment is reporting achievement,
because of the traditional emphasis on summative assessment for those items (for similar
restructuring see Brown & Remesal, 2012). The resulting model included seven latent first-
order variables, each with at least three measured indicators; however, the goodness of fit was
not acceptable. Thus, we retained Model 5, which is presented in full in Table 2.
Additional Analyses
None of the identified factors had good internal reliability (Table 3). In fact, according to
Nunnally (1978), our range of reliabilities from .60 to .73 is on the lower end of acceptable or
may even be considered questionable. Because Cronbach’s alpha is (a) related to the number
of items (Nunnally, 1978) and (b) the lowest limit of true scores (Sijtsma, 2009), and (c)
because of the exploratory nature of this research, we suggest these reliabilities are acceptable
at this time. There is, however, one exception: The two-item factor student accountability had
an alpha reliability of .44. In Model 6 these two items were combined with the two items from
the describes ability factor to create a single variable with four measured items and a reliability
of .67. Model 6, however, did not meet the necessarily goodness-of-fit indices to be retained.
Thus, the low measures of internal reliability documented in these results foreshadows the
need to either generate additional items for these first-order factors or test additional models
in which factors are combined.
The rank order of endorsement of each conception is listed in Table 3. Preservice teachers
agreed most strongly with the factor measuring assessment improves learning. Surprisingly and
contrary to existing research, participants next strongly rated the factor measuring assessment is
inaccurate. After this, students returned to endorsing positive assessment factors more strongly
than the negative factors, with improve teaching and describes ability as third and fourth,
respectively. Preservice teachers least strongly agreed with the factors describing assessment
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 151
TABLE 2
Item Wording and Standardized Factor Loadings for Final Model (No. 5)
Item
No. Factor Item Wording
Factor
Loading
1 School accountability provides information on how well schools are doing .59
2 School accountability is an accurate indicator of a school’s quality .75
3 School accountability is a good way to evaluate a school .78
5 Student accountability is assigning a grade or level to student work .41
6 Student accountability determines if students meet qualification standards .69
7 Describes ability is a way to determine how much students have learned from teaching .78
8 Describes ability establishes what students have learned .73
10 Improves learning provides feedback to students about their performance .66
11 Improves learning feeds back to students their learning needs .69
12 Improves learning helps students improve their learning .68
13 Improves teaching is integrated with teaching practice .65
14 Improves teaching information modifies ongoing teaching of students .50
15 Improves teaching allows different students to get different instruction .70
16 Is valid results are trustworthy .82
17 Is valid results are consistent .63
18 Is valid results can be depended on .62
19 Is bad forces teachers to teach in a way against their beliefs .53
20 Is bad is unfair to students .68
21 Is bad interferes with teaching .75
22 Is ignored conduct assessments but make little use of the results .48
23 Is ignored results are filed away and ignored .65
24 Is ignored has little impact on teaching .61
25 Is inaccurate results should be treated cautiously given measurement error .99
26 Is inaccurate should take into account the error and imprecision in all assessment .44
TABLE 3
Descriptive Statistics and Correlations for Each Variable
Variable
N
Items ˛ M SD Rank 1 2 3 4 5 6 7 8
1. Schools account 3 .73 8.83 2.53 7
2. Students account 2 .44 7.70 1.65 5 .39*
3. Describes ability 2 .73 8.03 1.92 4 .46* .41*
4. Improves learning 3 .71 13.39 2.38 1 .22* .25* .61*
5. Improves teaching 3 .63 12.28 2.51 3 .26* .25* .58* .64*
6. Is valid 3 .71 9.20 2.46 6 .53* .37* .49* .41* .41*
7. Is bad 3 .68 6.91 2.21 8 .07 .03 �.26* �.35* �.27* �.08
8. Is ignored 3 .60 6.59 2.07 9 .18* .00 �.19* �.31* �.28* .04 .56*
9. Is inaccurate 2 .61 8.32 1.97 2 .05 .04 .05 .07 .12 �.09 .16* .09
*p < .01.
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
152 DANIELS ET AL.
TABLE 4
Mean Differences Between Genders and Teaching Level (t Tests)
Variable
Female
M (SD)
Male
M (SD) t(430)
Elementary
M (SD)
Secondary
M (SD) t(429)
1. Schools account 8.91 (2.55) 8.51 (2.41) 1.43 9.04 (2.74) 8.66 (2.33) 1.54
2. Students account 7.70 (1.61) 7.67 (1.74) .14 7.71 (1.55) 7.70 (1.70) �.04
3. Describes ability 8.08 (2.01) 7.86 (1.60) .99 8.24 (1.96) 7.85 (1.87) 2.12
4. Improves learning 13.41 (2.43) 13.30 (2.23) .39 13.44 (2.43) 13.34 (2.34) .42
5. Improves teaching 12.36 (2.57) 12.02 (2.35) 1.21 12.53 (2.54) 12.07 (2.50) 1.88
6. Is valid 9.08 (2.42) 9.51 (2.54) �1.58 9.24 (2.32) 8.16 (2.57) .35
7. Is bad 6.97 (2.22) 6.76 (2.17) .85 6.93 (2.18) 6.90 (2.22) .14
8. Is ignored 6.56 (2.06) 6.66 (2.09) �.42 6.55 (2.17) 6.61 (1.96) �.30
9. Is inaccurate 8.39 (1.99) 8.17 (1.91) 1.01 8.48 (1.95) 8.20 (1.99) 1.48
N 329 301 — 200 231 —
as bad or ignored, thus supporting much of the previous research (Brown, 2011). These scores
were below the midpoint of the scale.
The correlations between variables supported these mean distinctions: Preservice teachers
who strongly endorsed the positive assessment factors of improving learning and improving
teaching tended to disagree with the negative assessment factors of assessment is bad and
assessment is ignored (rs range D �.19 to �.35, ps < .01). Of interest, this was not the case
for either of the two accountability factors, which had only one significant relationship with
negative factors, and that was in a positive direction (schools accountable and assessment is
ignored; r D .18, p < .01). The highest correlations emerged between the describes ability
factor and the improves learning factor, and between improves learning and improves teaching,
both of which exceeded .60.
Finally, there were no significant differences between male and female participants or
elementary and secondary students on any of the factors (Table 4). These results suggest that at
this point men and women and future elementary and secondary school teachers conceptualize
assessment in similar ways.
DISCUSSION
Three findings emerged that highlight the differences and similarities between the Canadian
sample and previous studies with the CoA-IIIA. First, Canadian preservice teachers’ concep-
tions of assessment aligned moderately with the factor structure originally determined in New
Zealand. In particular, we discuss the finding that no higher order factors were identified.
Second, a slight variation in level of endorsement emerged, such that this sample of Canadian
preservice teachers endorsed the factor measuring assessment as inaccurate more strongly than
some of the positive conceptions of assessment. Third, no differences emerged between men
and women or elementary and secondary preservice teachers.
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 153
Factor Structure
Brown’s original model suggested that conceptions of assessment are best represented as four
second-order factors with nine contributing first-order factors. Second-order models suggest
that some number of distinct but related constructs can be subsumed under one higher order
construct, resulting in a more parsimonious model. Aside from sacrificing some parsimony,
what does it imply that the best fitting model for this group of Canadian preservice teachers
could not be conceptualized in terms of higher order factors? Why does it seem that Canadian
preservice teachers conceptualize the purposes of assessment relatively discretely? One possible
explanation is that because this sample consisted of preservice rather than practicing teachers
they have not yet begun to view assessment as a broad and multifaceted concept. Instead,
they are focused on each potential purpose as a discrete function of assessment. In other
professions such as medicine and law (e.g., Eva, 2004; Mitchell, 1989), educational researchers
have identified that experts often make decisions using nonanalytical approaches (e.g., pattern
recognition, experiential knowledge), whereas novices do not have the cumulative experiences
to use this type of reasoning and approach tasks analytically. The same may be true for
preservice teachers’ conceptions of assessment. Indeed, Brown and Remesal (2012) produced
a first-order only model to fit their samples of preservice teachers. Although Brown (2002)
compared preservice and practicing teachers, more studies are needed in this area and should be
paired with an examination of how expertise in assessment practices is established over time.
The correlations between factors largely reflect the Canadian assessment climate. Unlike
Hong Kong, where student accountability is almost synonymous with improvement, our data
revealed a more moderate relationship between these factors. Specifically, the factors measuring
student accountability, improves learning, and improves teaching correlated significantly with r
values between .25 and .41. The school accountability factor had similar correlations, perhaps
suggesting that Canadian preservice teachers consider both student and school accountability
important in terms of improving education.
Levels of Endorsement
Canadian preservice teachers endorsed most conceptions at a similar level as samples from
other countries (Brown, 2011). In general, improvement factors (i.e., improves learning and
improves teaching) were rated most highly, and negative factors (i.e., assessment is bad and
assessment is ignored) were rated lowest. One exception emerged: Canadian preservice teachers
had relatively high levels of endorsement of the assessment is inaccurate factor. One reason
for this might be that these preservice teachers, who are enrolled in a mandatory assessment
course, are learning about the types of inaccuracies and errors that exist in assessments and thus
may be hypersensitive to these issues. Indeed, DeLuca and Klinger (2010) found that many
preservice teachers desired more instruction about the validity and reliability of assessment
practices. A second explanation is that perhaps other samples also responded strongly to this
subscale but because it is subsumed into the higher order factor of irrelevance in other research
the higher responses may be masked. In fact, almost all studies that involved the identification
of higher order factors provided no information on the characteristics of the first-order factors
(e.g., Brown et al., 2011; for an exception, see Brown & Michaelides, 2011). A third option is
that because preservice teachers are still students themselves they may experience assessment as
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
154 DANIELS ET AL.
inaccurate. For example, imagine a preservice teacher who is confident in his or her teaching
abilities but scores poorly on an exam in a curriculum course. This student may interpret
the assessment as inaccurate because it does not match his or her beliefs. In this way the
idea that assessment is inaccurate may reflect a mechanism by which preservice teachers can
protect their self-worth (Covington, 2000). Finally, this level of endorsement might reflect a true
concern about assessment practices in Canada. Although some inaccuracies are always possible,
Alberta has two major mechanisms in place to minimize this risk. First, Alberta Education
(http://education.alberta.ca/home.aspx) takes great care to link its standardized achievement
tests to the provincial program of studies, involve practicing teachers in exam creation and
scoring, and provide students with sample exam questions. In fact, Canadian teachers develop
and grade all criterion-referenced standardized exams (Volante, 2006). Second, the Alberta
Assessment Consortium (http://www.aac.ab.ca/) is an independent and not-for-profit agency that
advocates for sound assessment practices and creates a wide range of classroom assessments
for teachers in addition to providing professional development opportunities.
It is also interesting to note that this factor was unrelated to all other conceptions, except a
small positive correlation with the assessment is bad factor. Taken together, these results suggest
that although preservice teachers indeed believe that assessment is inaccurate, this inaccuracy
does not appear to systematically influence any other conceptions. This notion requires further
research because inaccuracies should negatively relate to all other conceptions. For example,
if assessment is inaccurate, how can it be valid, describe ability, or improve learning? Indeed,
other research has had to modify the model to better acknowledge that assessment is only able
to improve learning or hold students accountable to the extent that it is not inaccurate (Brown
et al., 2011).
Gender and Teaching Level
Respondents did not differ systematically based on gender or level of teacher training (i.e.,
elementary or secondary). Recently, Brown (2011) compared primary and secondary teachers’
conceptions of assessment and found his original model to be statistically invariant with good
fit across both groups. Like Brown (2011), our data suggest that at the outset of their training,
Canadian preservice teachers are relatively similar in their conceptions of assessment. Again,
this may be because at this point their conceptions may be based on a shared student experience
and have not been tailored to the assessment realities associated with either elementary or
secondary school teaching. We may expect to see some differences with practicing teachers,
but this is a question for future research.
Limitations
Three limitations should be kept in mind. First, although Brown (2011) called for research
examining the CoA-IIIA in high-stakes assessment jurisdictions such as the United States,
Canada may not meet this call. The province of Alberta in which the data were collected has
more standardized testing than any other Canadian province, but this rate is still much less than
the United States, thus generalizability to the rest of Canada or North America is tentative.
This leads to the second limitation, the homogeneity of our sample. In developing the survey
Brown tested the impact of a wide range of individual difference variables including years of
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 155
training, years of experience, and role in school on responses to the scale—none of which
emerged as significant (Brown, 2004b). Because our sample consisted of preservice teachers
some of these variables are not yet relevant; however, we do provide further evidence that
differences among gender and teaching level do not appear to make a difference even at the
preservice level. In the same vein, like other researchers (e.g., Segers & Tillema, 2011) we
used the teachers’ version of the Conceptions of Assessment inventory rather than the student
version. Although this was a conscious decision, it allows for some ambiguity in our results
because the sample could have answered the questions as students. Third, it is possible that
responses to these items may have been influenced by other questionnaires on the larger survey
that were not analyzed in this study.
Implications and Directions for Future Research in Preservice Education
Having validated the conceptions of assessment scale with Canadian preservice teachers, we are
poised to look at ways in which teachers’ conceptions of assessment influence their intended
practice. One option is to follow these preservice teachers into practice to determine whether
their preservice conceptions influence their practices. Specifically, are preservice teachers who
view assessment as a tool for improvement able to withstand the pressures of standardized
testing and remain committed to assessment for this purpose? Another area for future research
is to examine how conceptions of assessment change for preservice teachers after receiving
instruction in classroom assessment practices. This is a crucial shift from student to teacher
and may refine their conceptions and move them toward a better understanding of complex
possibilities of assessment.
Preservice teachers’ assessment conceptions are likely based on years of studenthood (Lortie,
1975), so a process is necessary to move their conceptions closer to current views of assessment.
Teacher education has a central role in this process, but there is little consistency in the
ways that assessment education is delivered across Canadian teacher education programs
(Russell, McPherson, & Martin, 2001). At a minimum, teacher education programs should
help preservice teachers become aware of their conceptions of assessment and how these
might influence assessment decisions they make as practicing teachers. After helping preservice
teachers become aware of their preconceptions, a course-based intervention could be designed
to help them conceive of assessment in ways that match current educational policies. Such
intervention could be implemented in education programs, thereby offering the best chance
at successful reform (Brown, 2004b; Brown & Harris, 2009). Although research consistently
shows it is difficult to change teachers’ beliefs (Pajares, 1992), C. D. Smith, Worsfold, Davies,
Fisher, and McPhail (2013) recently undertook such an endeavor and showed that indeed
assessment literacy can be improved through a relatively brief in-class intervention. Thus,
more research is needed in this area and can begin now that a measurement tool has been
validated.
FUNDING
This work was supported by two grants awarded to the first and second authors: a Social
Sciences and Humanities Research Council of Canada Standard Grant (410-2011-0095) and a
University of Alberta Teaching and Learning Enhancement Fund Grant (RES0004915).
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
156 DANIELS ET AL.
REFERENCES
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–
211.
Alberta Teachers’ Association. (2009). Real learning first: The teaching profession’s view of student assessment, evalu-
ation and accountability for the 21st century. Issues in Education, 7, 1–34. Retrieved from http://www.teachers.ab.ca
Arbuckle, J. L. (2011). IBM® SPSS® Amos™ 20 user’s guide. Retrieved from ftp://129.35.224.12/software/analytics/
spss/documentation/amos/20.0/en/Manuals/IBM_SPSS_Amos_User_Guide.pdf
Bachor, D. G., & Baer, M. R. (2001). An examination of preservice teachers’ simulated classroom assessment practices.
Alberta Journal of Educational Research, 47, 244–258. doi:10.1787/888932343342
British Columbia Teachers’ Federation. (2012). Better schools for BC: A plan for quality public education. Retrieved
from http://www.betterschools.bc.ca
Brown, G. T. L. (2002). Teachers’ conceptions of assessment (Doctoral dissertation, University of Auckland, Auckland,
New Zealand). Retrieved from http://hdl.handle.net/2292/63
Brown, G. T. L. (2004a). Measuring attitude with positively packed self-report ratings: Comparison of agreement and
frequency scales. Psychological Reports, 94, 1015–1024.
Brown, G. T. L. (2004b). Teachers’ conceptions of assessment: Implications for policy and professional development.
Assessment in Education, 11, 301–318. doi:10.1080/0969594042000304609
Brown, G. T. L. (2006). Teachers’ conceptions of assessment: Validation of an abridged version. Psychological Reports,
99, 166–170. doi:10.2466/PRO.99.1.166-170
Brown, G. T. L. (2008). Conceptions of assessment: Understanding what assessment means to teachers and students.
New York, NY: Nova Science.
Brown, G. T. L. (2011). Teachers’ conceptions of assessment: Comparing primary and secondary teachers in New
Zealand. Assessment Matters, 3, 45–70.
Brown, G. T. L., & Harris, L. R. (2009). Unintended consequencesof using tests to improve learning: How improvement-
oriented resources heighten conceptions of assessment as school accountability. Journal of MultiDisciplinary Eval-
uation, 6(12), 68–91.
Brown, G. T. L., Kennedy, K. J., Fok, P. K., Chan, J. K. S., & Yu, W. M. (2009). Assessment for student improvement:
Understanding Hong Kong teachers’ conceptions and practices of assessment, Assessment in Education: Principles,
Policy & Practice, 16, 347–363. doi:10.1080/09695940903319737
Brown, G. T. L., Lake, R., & Matters, G. (2011). Queensland teachers’ conceptions of assessment: The impact of
policy priorities on teacher attitudes. Teaching and Teacher Education, 27, 210–220. doi:10.1016/j.tate.2010.08.003
Brown, G. T. L., & Michaelides, M. P. (2011). Ecological rationality in teachers’ conceptions of assessment across
samples from Cyprus and New Zealand. European Journal of Psychology of Education, 26, 319–337. doi:10.1007/
s10212-010-0052-3
Brown, G. T. L., & Remesal, A. (2012). Prospective teachers’ conceptions of assessment: A cross-cultural comparison.
Spanish Journal of Psychology, 15, 75–89. doi:10.5209/rev_SJOP.2012.v15.n1.37286
Campbell, C., & Evans, J. A. (2000). Investigation of preservice teachers’classroom assessment practices during student
teaching. The Journal of Educational Research, 93, 350–355. doi:10.1080/00220670009598729
Canadian Teachers’ Federation. (2003). Moving from the cult of testing to a culture of professional accountability.
Perspectives, 3, 1–9.
Choi, C. (1999). Public examinations in Hong Kong. Assessment in Education: Principles, Policies, and Practice, 6,
405–417.
Covington, M. V. (2000). Goal theory, motivation, and school achievement: An integrative review. Annual Review of
Psychology, 51, 171–200. doi:0084–6570/00/0171–0200
Cowley, P. & Easton, S. (2013, February). Studies in education policy: Report card on Alberta’s elementary schools
2013. Retrieved from http://alberta.compareschoolrankings.org
Crooks, T. J. (2002). Educational assessment in New Zealand schools. Assessment in Education: Principles, Policy, &
Practice, 9, 237–253. doi:10.1080/0969594022000001959
Daniels, L. M., Mandzuk, D., Perry, R. P., & Moore, C. (2011). The impact of teacher candidates’ perceptions of their
initial teacher education program on teaching anxiety, efficacy, and commitment. Alberta Journal of Educational
Research, 57, 88–106.
DeLuca, C., & Klinger, D. A. (2010). Assessment literacy development: Identifying gaps in teacher candidates’ learning.
Assessment in Education: Principles, Policy & Practice, 17, 419–438. doi:10.1080/0969594X.2010.516643
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
CONCEPTIONS OF ASSESSMENT 157
Elliot, A. J., & Murayama, K. (2008). On the measurements of achievement goals: Critique, illustration, and application.
Journal of Educational Psychology, 100, 613–628. doi:10.1037/0022-0663.100.3.613
Eva, K. W. (2004). What every teacher needs to know about clinical reasoning. Medical Education, 39, 98–106.
doi:10.1111/j.1365-2929.2004.01972.x
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria
versus new alternative. Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118
Joreskog, K. G., & Sorbom, D. (1984). LISREL VI: Analysis of Linear Structural Relationships by the Method of
Maximum Likelihood. Chicago, IL: National Educational Resources.
Kline, R. B. (2005). Principles and practices of structural equation modeling (2nd ed.). New York, NY: Guilford.
Klinger, D., DeLuca, C., & Miller, T. (2008). The evolving culture of large-scale assessments in Canadian education.
Canadian Journal of Educational Administration and Policy, 76, 1–34.
Leighton, J. P., Gokiert, R. J., Cor, M. K., & Heffernan, C. (2010). Teacher beliefs about the cognitive diagnostic
information of classroom versus large-scale tests: Implications for assessment literacy. Assessment in Education:
Principles, Policy & Practice, 17, 7–21. doi:10.1080/09695940903565362
Lejeune, A., Poth, C., & Daniels, L. M. (2010, May). Examining pre-service teachers’ knowledge, skills, attributes,
experiences, and goals related to formative and summative classroom assessment. Paper presented at the Canadian
Society for the Study of Education, Montreal, Canada.
Lortie, D. (1975). Schoolteacher: A sociological study. Chicago, IL: University of Chicago Press.
Marhuenda, F. (2006). Assessment in the Spanish educational system. Assessment in Education: Principles, Policy, &
Practice, 4, 413–429. doi:10.1080/0969594970040307
Midgley, C., Maehr, M., Hruda, L., Anderman, E., Amerman, L., Freeman, K., : : : Urdan, T. (2000). Manual for the
Patterns of Adaptive Learning Scales. Ann Arbor: University of Michigan.
Mitchell, J. B. (1989). Current theories on expert and novice thinking: A full faculty considers the implications for
legal education. Journal of Legal Education, 39, 275–289.
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.
Organisation for Economic Co-operation and Development. (2010). PISA 2009 results: Executive summary. Retrieved
from http://www.oecd.org/pisa/pisaproducts/46619703.pdf
Pajares, M. F. (1992). Teachers’ beliefs and educational research: Cleaning up a messy construct. Review of Educational
Research, 62, 307–332.
Papile, C., Daniels, L. M., Poth, C., & Hutchison, M. A. (2012, April). Mastery and performance: Giving structure
to instructional and assessment practices. Poster presented at the annual meeting of the Western Psychological
Association, San Francisco, CA.
Parker, A., & Neuharth-Pritchett, S. (2006). Developmentally appropriate practice in kindergarten: Factors shaping
teacher beliefs and practice. Journal of Research in Childhood Education, 21, 65–78. doi:10.1080/02568540609
594579
Principles for Fair Student Assessment Practices for Education in Canada. (1993). Edmonton, Alberta: Joint Advisory
Committee. (Available from Joint Advisory Committee, Centre for Research in Applied Measurement and Evaluation,
3–104 Education Building North, University of Alberta, Edmonton, Alberta, T6G 2G5)
Rothstein, R., Ladd, H. F., Ravitch, D., Bakers, E. L., Barton, P. E., Darling-Hammond, L., : : : Sheppard, L. A. (2010).
Problems with the use of student test scores to evaluate teachers (Economic Policy Institute Briefing Paper #278).
Retrieved from http://www.epi.org/publication/bp278/
Russell, T., McPherson, S., & Martin, A. K. (2001). Coherence and collaboration in teacher education reform. Canadian
Journal of Education, 26, 37–55. Retrieved from http://www.jstor.org/stable/1602144
Scheerns, J., Ehren, M., Sleegers, P., & de Leeuw, R. (2012). OECD Review on Evaluation and Assessment Frameworks
for Improving School Outcomes: Country Background Report for the Netherlands. Retrieved from www.oecd.org/
edu/school/NLD_CBR_Evaluation_and_Assessment.pdf
Segers, M., & Tillema, H. (2011). How do Dutch secondary teachers and students conceive the purpose of assessment?
Studies in Educational Evaluation, 37, 49–54. doi:10.1016/j.stueduc.2011.03.008
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74,
107–120. doi:10.1007/s11336-008-9101-0.
Smith, C. D., Worsfold, K., Davie, L., Fisher, R., & McPhail, R. (2013). Assessment literacy and student learning:
The case for explicitly developing students’ ‘assessment literacy.’ Assessment and Evaluation in Higher Education,
38, 44–60. doi:10.1080/02602938.2011.598636
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014
158 DANIELS ET AL.
Smith, K. E., & Croom, L. (2000). Multidimensional self-concepts of children and teacher beliefs about developmentally
appropriate practices. Journal of Educational Research, 93, 312–321. Retrieved from http://www.jstor.org/stable/
27542281
Stiggins, R. J., & Conklin, N. F. (1992). In teachers’ hands: Investigating the practices of classroom assessment.
Albany: SUNY Press.
Ungerleider, C. (2003). Large-scale student assessment: Guidelines for policymakers. Interactional Journal of Testing,
3, 119–128. doi:10.1207/S15327574IJT0302_2
Ungerleider, C. (2006). Government, neo-liberal media, and education in Canada. Canadian Journal of Education, 29,
70–90. doi:10.2307/20054147
Volante, L. (2006). An alternative vision for large-scale assessment in Canada. Journal of Teaching and Learning, 4,
1–14.
Volante, L., & Ben Jaafar, S. (2008). Educational assessment in Canada. Assessment in Education, Principles, Policy,
& Practice, 15, 201–210. doi:10.1080/09695940802164226
Volante, L., & Fazio, X. (2007). Exploring teacher candidates’ assessment literacy: Implications for teacher education
reform and professional development. Canadian Journal of Education, 30, 749–770. doi:10.2307/20466661
Woolley, S. L., Benjamin, W. J., & Woolley A. W. (2004). Approaches to teaching and learning construct validity of
a self-report measure of teacher beliefs related to constructivist and traditional approaches to teaching and learning.
Educational and Psychological Measurement, 64, 319–331. doi:10.1177/0013164403261189
Zwaagstra, M. (2011). Standardized testing is a good thing. Policy Series: Frontier Centre for Public Policy, 119,
1–15. Retrieved from http://www.fcpp.org/files/1/PS119StandardizedTesting.pdf
Dow
nloa
ded
by [
Tex
as A
&M
Uni
vers
ity L
ibra
ries
] at
20:
45 1
2 N
ovem
ber
2014