examining the grading practices of teachers

9
Examining the grading practices of teachers Jennifer Randall a, * , George Engelhard b a University of Massachusetts,111 Thatcher Road, Hills South, Room 171, Amherst, MA 01003, USA b Emory University, 1784 N. Decatur Road, Suite 202, Atlanta, GA 30322, USA article info Article history: Received 9 June 2008 Received in revised form 9 March 2010 Accepted 18 March 2010 Keywords: Grading practices Grades abstract Despite the recommendations of some measurement specialists, teachers do not always assign grades based on achievement only. The primary purpose of this study is to clarify the meaning of grades, and to examine some of the factors teachers consider when assigning nal grades with a focus on borderline cases. The sample consisted of 516 American public school teachers, selected via stratied random sample in a major metropolitan school district in the Southeast. A 53-item survey using Guttmans mapping sentences, previously piloted in a separate school district, was created and distributed. Teachers were provided with scenarios that described student ability, achievement, behavior, and effort and the teacher was asked to assign both a numerical and letter grade. A four-way between-subjects ANOVA was conducted with the student characteristics ability, achievement, behavior, and effort as independent variables and nal grade as the dependent variable. Findings demonstrate that teachers abided by the ofcial grading policy of the participating school district assigning grades based primarily on achieve- ment under most circumstances, however, in some borderline cases teachers report considering non- achievement factors. Implications for pre-service and in-service professional development are discussed. Ó 2010 Elsevier Ltd. All rights reserved. Most public school parents in the United States are satised with the quality of their community schools and the education their children receive (Rose & Gallup, 2007) despite the nations overall concern with low achievement. This unexpected satisfaction may be explained, in part, because parents rely primarily on teacher assigned grades when ascertaining the achievement of their chil- dren e and often, according to these grades their children are achieving well. Both international and national standardized assessments as well as college freshman performance, however, suggest otherwise. Do student grades represent actual student achievement? Most measurement textbooks, designed for both pre-service and in-service teachers, assert that they should. Linn and Miller (2005) write in their measurement textbook that in the nal analysis, letter grades should reect the extent to which students have achieved the learning outcomes specied in the course objectives, and these should be weighted according to their relative importance(p. 377). Brookhart contends, in her measurement textbook intended for classroom teachers, that the primary purpose for grading e for both individual assignment grades and report card grades e should be to communicate with students and parents about their achievement of learning goals(2004, p. 5). In other words, grades should only represent student achievement. Despite attempts to explain the appropriate uses of grades to teachers, Stanley and Baines (2001) assert that a students nal grade does not always simply reect academic performance. Instead, they argue, grades now serve a potpourri of inappropriate purposes including, but not limited to, self-esteem boosters, public relations, rewards, and vehicles to increase college funding for students. According to Brookhart (2004), teachers should feel free to assess factors other than achievement, but these factors e like attitude, participation, and effort e should not be graded. When nal grades are composed of some combination of achievement, ability, behavior, and effort, problems may arise over the meaning of the grade. For instance, effort is a difcult construct to accurately measure. It could be demonstrated by homework completion, attendance, alertness, attentiveness, or a myriad of other variables. Moreover, low achieving students tend to get the benet of effort considerationfar more often than high achieving students. Linn and Miller (2005) also dispute the appropriateness of including other variables such as effort in the nal grade. In addition to the difculty inherent in measuring effort and the lack of fairness to students with higher ability, it is difcult to distinguish between aptitude and achievement even with the most sophisticated measures, as both are dependent on student learning(p. 377). * Corresponding author. Tel.: þ1 413 545 0227 (ofce); fax: þ1 413 545 1523. E-mail address: [email protected] (J. Randall). Contents lists available at ScienceDirect Teaching and Teacher Education journal homepage: www.elsevier.com/locate/tate 0742-051X/$ e see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tate.2010.03.008 Teaching and Teacher Education 26 (2010) 1372e1380

Upload: jennifer-randall

Post on 29-Oct-2016

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Examining the grading practices of teachers

lable at ScienceDirect

Teaching and Teacher Education 26 (2010) 1372e1380

Contents lists avai

Teaching and Teacher Education

journal homepage: www.elsevier .com/locate/ tate

Examining the grading practices of teachers

Jennifer Randall a,*, George Engelhard b

aUniversity of Massachusetts, 111 Thatcher Road, Hills South, Room 171, Amherst, MA 01003, USAb Emory University, 1784 N. Decatur Road, Suite 202, Atlanta, GA 30322, USA

a r t i c l e i n f o

Article history:Received 9 June 2008Received in revised form9 March 2010Accepted 18 March 2010

Keywords:Grading practicesGrades

* Corresponding author. Tel.: þ1 413 545 0227 (offiE-mail address: [email protected] (J. Rand

0742-051X/$ e see front matter � 2010 Elsevier Ltd.doi:10.1016/j.tate.2010.03.008

a b s t r a c t

Despite the recommendations of some measurement specialists, teachers do not always assign gradesbased on achievement only. The primary purpose of this study is to clarify the meaning of grades, and toexamine some of the factors teachers consider when assigning final grades with a focus on borderlinecases. The sample consisted of 516 American public school teachers, selected via stratified randomsample in a major metropolitan school district in the Southeast. A 53-item survey using Guttman’smapping sentences, previously piloted in a separate school district, was created and distributed. Teacherswere provided with scenarios that described student ability, achievement, behavior, and effort and theteacher was asked to assign both a numerical and letter grade. A four-way between-subjects ANOVA wasconducted with the student characteristics ability, achievement, behavior, and effort as independentvariables and final grade as the dependent variable. Findings demonstrate that teachers abided by theofficial grading policy of the participating school district assigning grades based primarily on achieve-ment under most circumstances, however, in some borderline cases teachers report considering non-achievement factors. Implications for pre-service and in-service professional development are discussed.

� 2010 Elsevier Ltd. All rights reserved.

Most public school parents in the United States are satisfiedwith the quality of their community schools and the educationtheir children receive (Rose & Gallup, 2007) despite the nation’soverall concernwith lowachievement. This unexpected satisfactionmay be explained, in part, because parents rely primarily on teacherassigned grades when ascertaining the achievement of their chil-dren e and often, according to these grades their children areachieving well. Both international and national standardizedassessments as well as college freshman performance, however,suggest otherwise. Do student grades represent actual studentachievement? Most measurement textbooks, designed for bothpre-service and in-service teachers, assert that they should. Linnand Miller (2005) write in their measurement textbook that “inthe final analysis, letter grades should reflect the extent to whichstudents have achieved the learning outcomes specified in thecourse objectives, and these should be weighted according to theirrelative importance” (p. 377). Brookhart contends, in hermeasurement textbook intended for classroom teachers, that the“primary purpose for grading e for both individual assignmentgrades and report card grades e should be to communicate withstudents and parents about their achievement of learning goals”

ce); fax: þ1 413 545 1523.all).

All rights reserved.

(2004, p. 5). In other words, grades should only represent studentachievement.

Despite attempts to explain the appropriate uses of grades toteachers, Stanley and Baines (2001) assert that a student’s finalgrade does not always simply reflect academic performance.Instead, they argue, grades now serve a potpourri of inappropriatepurposes including, but not limited to, self-esteem boosters, publicrelations, rewards, and vehicles to increase college funding forstudents.

According to Brookhart (2004), teachers should feel free toassess factors other than achievement, but these factors e likeattitude, participation, and effort e should not be graded. Whenfinal grades are composed of some combination of achievement,ability, behavior, and effort, problems may arise over the meaningof the grade. For instance, effort is a difficult construct to accuratelymeasure. It could be demonstrated by homework completion,attendance, alertness, attentiveness, or a myriad of other variables.Moreover, low achieving students tend to get the benefit of “effortconsideration” far more often than high achieving students. Linnand Miller (2005) also dispute the appropriateness of includingother variables such as effort in the final grade. In addition to thedifficulty inherent in measuring effort and the lack of fairness tostudents with higher ability, “it is difficult to distinguish betweenaptitude and achievement even with the most sophisticatedmeasures, as both are dependent on student learning” (p. 377).

Page 2: Examining the grading practices of teachers

A teacher considers a student’s

Ability Low Average High

Achievement Low Average High

Behavior Inappropriate Average Excellent

Degree of Effort Low Great Deal

And assigns a final letter grade

A, A-, B+, B, B-, C+, C, C-, D, or F

Fig. 1. Guttman’s Mapping Sentence. Ability, Achievement, Behavior, and Effort (inbold) are the four factors (independent variables) in the model, each with two (effort)or three levels within each factor.

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e1380 1373

The consideration of ability, or potential, in final grade assign-ment is also a criticized practice. On the surface, assigning a studenta grade based on potential seems fair. Indeed, ability and achieve-ment tests are highly correlated (Thorndike, 2005). One mightassume that this relationship indicates that students with lowerability simply have less potential for academic achievement and,therefore, should not be expected to perform at the same levels.Despite the technical problems associatedwith such an assumption(the two types of tests being remarkably similar and often differingmerely in use and interpretation), other undesirable consequencesof this grading practice exist. In particular, individuals tend toperform at levels congruent to expectations. As a result, if studentswith lower ability are expected to have lower achievement, theymay achieve accordingly. In addition, students with higher abilitylevels may resent having to achieve at higher levels to receive thesame grades as lower ability students (Thorndike, 2005).

Classroom teachers, however, often do not take heed of expertadvice and continue to consider several achievement and non-achievement variables when assigning grades. The effort andability facets are the most frequently reported non-achievementfactors considered by classroom teachers (Cicmanec, Johanson, &Howley, 2001; Cross & Frary, 1999; Frary, Cross, & Weber, 1992).Brookhart (1993) found that teachers, regardless of measurementtraining, routinely considered both a student’s level of ability andeffort when assigning final grades based on hypothetical scenariosespecially when faced with borderline decisions (passing or failinga course). McMillan and Nash (2000) reported similar results withteachers admitting consideration of achievement as well as non-achievement factors, like effort and participation, when assigningfinal grades. Similarly, Bursuck et al. (1996) found that teachers atall grade levels consider a variety of facets when assigning students’grades including effort, notebooks, attendance, class participation,and preparedness/organization (all of which could be consideredproxies for effort); whereas, elementary school teachers were morelikely to adjust grades based on ability than middle and high schoolteachers. Forty-seven percent of elementary school teacherssurveyed reported using ability from “quite a bit” to “completely”when assigning final grades (McMillan, Myran, & Workman, 2002).Feldman, Alibrandi, and Kropf (1998) also reported that 16% of highschool science teachers surveyed reported using ability as the basisfor grade assignment. High school teachers in a study conducted byStiggins, Frisbie, and Griswold (1989) reported that, althoughclassroom achievement was the most important factor in gradeassignment, effort, measured by homework completion and extracredit, should also receive significant consideration particularly forlow ability students. Similarly, in other studies, secondary teachershave reported raising grades for high effort fairly often (Cross &Frary, 1999).

Other research suggests that the grade construct is composed of,in addition to achievement, ability, and effort, a behavior compo-nent as well (Cizek, Robert, & Fitzgerald, 1995; Frary et al., 1992;McMillan et al., 2002). Cizek, et al. (1995) found that 61% ofteachers reported considering non-achievement measures such asbehavior and effort. Frary, et al. (1992) found that 31% of teachersagreed, or tended to agree, that behavior should affect the grade. InStiggins et al., (1989) study of high school teachers, all participantsreported consideration of attitude when making decisions aboutborderline cases. Cross and Frary (1999) also reported that 37% ofsecondary teachers reported that they consider conduct and atti-tude when assigning final grades. McMillan et al. (2002) also foundthat elementary teachers consider, in addition to achievementindicators, other indicators such as disruptive behavior whenassigning grades. With such varied interpretations of what gradesreally mean, or should mean, conflict or confusion is likely to ariseamong all three stakeholders e teachers, students, and parents.

Grades are a significant component within the American systemof education. They are used to determine class placement, schol-arships, and college admissions. Previous research indicates thatteachers consider many factors, to various extents, other than pureacademic achievement when assigning final grades to theirstudents including, but not limited to, homework, participation,improvement, ability, effort, and behavior. Overall, the literaturesuggests that teachers consistently consider four major factorswhen assigning final grades e student academic achievement,student ability, student effort, and student behavior. The primarypurpose of this study is to examine which, and to what extent,teachers consider these four factors when assigning final grades.Although the primary purpose of this study is to examine thegrading practices/philosophies of U.S public school teachers, werecognize that the high value placed on grades is not a phenomenalimited to American school systems. As students become moremobile (e.g. attending universities, earning scholarships, andseeking employment abroad), we believe these findings will havenational (U.S.) and international implications.

1. Method

1.1. Instrument

The instrument for this study was developed using Guttman’sMapping Sentences (Guttman, 1977). Each mapping sentence iscomposed of several independent variables which in combinationpredict one dependent variable. Because this study sought todetermine the extent to which teachers consider four factors e

ability, classroom achievement, behavior, and effort, a mappingsentence was developed based on these four independent vari-ables. Fig. 1 presents the mapping sentence that guides this studyand provides a visual representation of how the factors and theirlevels are related. The ability (Cizek, Fitzgerald, & Rachor, 1996;McMillan et al., 2002) factor has three levels: high ability, averageability, and lowability. The classroom achievement (Cicmanec et al.,2001; Cizek et al., 1995) factor is also composed of three levels: highclassroom achievement, average classroom achievement, and lowclassroom achievement. The behavior (Cicmanec et al., 2001; Cross& Frary, 1999; Frary et al., 1992; Kahn, 2000; McMillan & Nash,2000; Stiggins et al., 1989) factor has three levels as well: excel-lent behavior, average behavior, and inappropriate behavior. Finally,the effort (Bursuck et al., 1996; Cicmanec et al., 2001; Cizek et al.,1996; Cross & Frary, 1999; Stiggins et al., 1989) factor is onlycomposed of two levels: a great deal of effort or low effort.Preliminary focus group discussions revealed that teachers mostoften see effort as a dichotomous characteristic (i.e. students workhard or they do not).

Page 3: Examining the grading practices of teachers

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e13801374

This mapping sentence was then used to create the teacherquestionnaire. Each unique combination of each factor and its levelswere formed (3 � 3 � 3 � 2) creating 54 questionnaire items/hypothetical scenarios. Table 1 contains sample items/scenarios,with their respective factors and levels identified. In this study wechose to examine very specific grading circumstances that focus onstudents with achievement levels on the borderline between twogrades (A and B, C and B, and D and F). The relative importance ofpassing or failing a course is obvious to most stakeholders regard-less of the academic level of the student being discussed. Addi-tionally, the difference between an A and a B is of great importanceto college bound students in the United States seeking scholarshipmoney. The significance of assigning a B instead of a C may be lessobvious to those who live outside of the state from which thissample was taken. In this state, any and all students who maintaina B average in high school receive full tuition scholarships to anystate college or university. Most participating teachers are aware ofthis enormous incentive when assigning final grades to students.

Because we believed that teachers would become fatigued ordistracted if required to complete a long questionnaire, a fractionalreplication design (Roskam & Broers, 1996) was utilized. Three formswith 36 scenarios each were developed from the complete question-naire. Each form shares some scenarios with, at least, one other form.

Using the complete questionnaire as the master, Form Acontains items 1e36. Form B contains items 19e54. Form Ccontains items 1e18 and 36e54. Each teacher received eitherForm A, Form B, or Form C with the following directions:

Below are several scenarios that ask you to make decisionsabout your grading practices. For each one, certain variableshave been manipulated. Please read each one carefully as if it isreferring to your own class and respond accordingly. The key islisted below and at the bottom of each additional page. For eachitem, please respond with BOTH a letter grade and a numericalgrade. Many of these scenarios may not include a specificnumeric representation (or equivalent), however, the numericgrade you assign is left to your interpretation. Thank you foryour time.

In addition, teachers were provided with operational definitionsat the beginning of the questionnaire describing what would beconsidered, for the purposes of this study, high/average/low ability,high/average/low achievement, excellent/average/inappropriatebehavior, and great deal/low effort.

Table 1Sample scenarios based on student characteristics and level of each characteristic.

Factor (Student Characteristic) Level Item (Scen

Ability High Jonathan iadministeHe rarelybased on p89% of the

Achievement HighBehavior ExcellentEffort Great Deal

Ability Average Glenda isadministeShe talks owork veryshe has m

Achievement HighBehavior InappropriateEffort Low

Ability Average Willie is aby the schand has gryou know

Achievement LowBehavior ExcellentEffort Great Deal

Ability Low Donna is aHer behavdisobedienthat she h

Achievement AverageBehavior InappropriateEffort Low

Note. Each teacher responded to 36 of the above scenarios in the 3rd column (questionn

It should be noted at this time that due to a processing error, onescenariowas inadvertently not placed on any form, and instead twoforms contained two of the same scenarios. The missing item (#13)described a student with high ability, low achievement, excellentbehavior, and high levels of effort. Interestingly, focus groupparticipants suggested that this item be removed from the ques-tionnaire due to its unlikelihood. Each teacher responded to 36different items (representing students with specific combinations,or levels, of ability, achievement, behavior, and effort) once. In otherwords, each scenario/item represented one student and teachersassigned a final grade to 36 different students.

1.2. Focus groups

Three focus groups, comprised entirely of high school,elementary, and middle school teachers, respectively, were formedto consider the comprehensiveness and ease of understanding ofthe questionnaire and each scenario. The main purpose of thesefocus groups was to obtain feedback regarding the questionnairedesign including the clarity, ease of understanding, word choice,and length of the questionnaire. A secondary purpose of the focusgroups was to allow teachers the opportunity to share theirperceptions about appropriate grading practices particularly inrelation to the questionnaire and the student scenarios underreview.

Based on the recommendations of Krueger (1994), each focusgroup was no larger than five teachers (5, 4, and 2 at the elemen-tary, middle, and high school levels, respectively) to allow partici-pants more opportunities to speak and share their views. Focusgroup participants were selected through the snowball technique.At each grade level one teacher was chosen who met the selectioncriteria. That teacher then nominated other colleagues that alsomet the selection criteria. All focus group participants had at least 5years of teaching experience and currently teach in a public school.Primarily, participants made recommendations that wouldimprove the clarity of the scenarios including suggestionsregarding word choice. Changes in the questionnaire were made ifa majority of the teachers present agreed that the change wouldimprove the instrument. In cases where there was a tie themoderator/primary researcher cast the deciding vote. Teacher-participants also provided more substantive/content-related feed-back regarding the initial questionnaire. We considered and

ario)

s a student with high ability, based on intelligence testsred by the school. His behavior in class is always excellent.talks out of turn and has great manners. He works hard and,roject, test and quiz scores, you know that he has masteredcourse objectives

a student with average ability, based on intelligence testsred by the school. Her behavior is completely inappropriate.ut of turn often in class and is often disobedient. She does nothard, but based on project, test and quiz scores, you know thatastered 89% of the course objectives.

student with average ability, based on intelligence tests administeredool. His behavior in class is always excellent. He rarely talks out of turneat manners. He works very hard, but based on project, test and quiz scores,that he has mastered 69% of the course objectives.

student with low ability, based on intelligence tests administered by the school.ior is completely inappropriate. She talks out of turn often in class and is oftent. She does not work hard, and based on project, test and quiz scores, you knowas mastered 79% of the course objectives.

aire did not include information found in the 1st two columns).

Page 4: Examining the grading practices of teachers

Table 2Demographic information of teachers.

Demographic information ElementaryN ¼ 79

MiddleN ¼ 155

HighN ¼ 108

Total

Total years teachingMean 9.9 9.5 10.6 9.9Standard Deviation 7.0 6.5 6.0 6.5

Measurement Course (percentages)No course 57.7 64.3 74.0 65.4Course 38.4 35.0 23.1 32.1Missing 3.9 .7 2.8 2.1

Academic Level (percentages)General 50.4 52.7 48.1 50.7Special Education 1.3 2.6 .9 1.8AP/Honors/Gifted Only 1.3 12.5 14.8 10.7General and Special Education 1.3 .0 .0 .3General and Honors .0 16.1 22.2 14.3Other .9 4.52 1.84 2.84Missing 44.8 11.6 12.1 19.4

Gender (percentages)Women 98.3 86.4 89.8 90.2Men .4 12.9 10.2 9.2Missing 1.3 .7 0 .6

Subject Area: Middle & High Only (percentages)Elementary 100Social Studies 14.9 18.5 12.6Science 18.1 23.1 15.6Math 29.3 25.0 21.5English 21.0 17.6 16.0Foreign Language 8.3 10.2 7.0Other 1.9 0 1.1Missing 6.4 5.6 22.4

Race/Ethnicity (percentages)White 43.6 74.9 65.7 64.8African American 38.5 8.3 14.8 17.3Hispanic 0 .7 5.6 2.1Other 3.8 1.9 2.8 2.6Missing 14.1 14.2 11.1 13.2

The total number of teachers who responded to the on-line survey was 516. 174teachers completed the survey but opted not to provide their demographic infor-mation. The responses of all 516 teachers were still included in the data analysis.

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e1380 1375

integrated all of the feedback they provided in light of the purposesof the study.

1.3. Participants and setting

The final questionnaire was distributed in a large metropolitanschool district in the southeastern United States. It is the largestdistrict in the state that consists of 151,903 students with 63elementary schools (kindergarden e 5th grade), 20 middle schools(6the8th grade), and 16 high schools (9the12th grade). Anabridged version of the school district’s official grading policy, withthe most relevant sections, can be found in Appendix A. This policyrequires that all teachers assign grades based on student academicprogress alone as defined by the district’s content standards. Atboth the elementary and middle school levels, separate grades areassigned for both conduct and effort. Final academic gradesassigned based on either the conduct or the effort of students isexplicitly forbidden. As with most American schools, grades rangefrom A (indicating excellent performance) to F (indicating failingperformance). For sample selection, schools were divided into oneof three categories e elementary, middle, or high school; and eachschool within each category was assigned a number. Then usinga table of random numbers, individual schools, bearing in mind thetotal number of teachers within each school, were randomlyselected from within each category to participate. The question-naire were distributed, via email request, to school principals rep-resenting approximately 2400 teachers (800 elementary, 800middle school, and 800 high school). It should be emphasized thatthe email requests were not sent directly to teachers, but rather toschool principals. Assuming the principals forwarded the request totheir teachers, a lower bound on the response rate is 21.5% (516questionnaires were completed on-line).

Not surprisingly, the overwhelming majority of teacher-participants in our sample are women (90.2%). Overall, femaleteachers outnumber male teachers by nearly 9 to 1. Among thoseteachers who reported their grade level, middle school teachers(30.2%) composed the largest proportion of the sample followedby high school teachers (20.9%) and elementary school teachers(15.3%). Additional sample demographic information is included inTable 2. Readers should note that of the 516 teachers whoresponded to the survey, 174 teachers did not provide demo-graphic information. Although these teachers cannot be includedin Table 2, their responses were included in the final data analysis.

1.4. Data analysis

Although this study technically employs a simple survey design,the use of mapping sentences allowed for the manipulation of allfour independent variables (ability, achievement, behavior, andeffort) in the scenarios. Teachers responded to various scenariosthat were systematically manipulated in a manner similar to anexperimental design. As such, the SPSS computer software programwas used to analyze data using a between-subject analysis ofvariance. Each scenario was composed of four independent vari-ables (factors): ability, achievement, behavior, and effort. Eachscenario’s score (dependent variable) was the grade given by theteachers. The reader should note that each teacher did not respondto each scenario multiple times, but to 36 different scenarios once.This structure suggested that the use of a between-subject analysisof variance was appropriate. In addition to the between-subjectsfactorial ANOVA analyses, effect sizes (hp2) were computed for boththe main effects and interaction effects of the four-way ANOVAmodel. Green and Salkind (2005) point out that “it is unclear whatare small, medium, and large values of hp2. What is a small versusa large h2 is dependent on the area of investigation. In all likelihood,

the conventional cutoffs of .01, .06, and .14 for small, medium, andlarge h2 are too large for hp

2 (p.187)”. Prior to analysis, we deter-mined that the conservative partial effect size of .06 would indicatepractical significance for the purposes of this study. Furthermorewe generated confidence intervals for all mean grades across thestudent characteristics to establish the differences, if any, amongthe final grades assigned.

2. Results

For each scenario teachers were asked to assign both a numer-ical and letter grade. The correlation coefficient comparing theletter grades and numerical grades was produced to examine howteachers use the different, yet similar, grading scales. In otherwords, we sought to determine if an A� to one teacher representedthe same numerical term as an A� would to another teacher. Theresults indicate a strong correlation between letter grades andnumerical grades (r ¼ .91). Because the 100-point numerical scaleprovides more variance than the letter grade scale, numericalgrades were used for primary analysis. Analyses with the lettergrade, however, were also conducted to insure the comparability ofthe two dependent variables. The results of the full four-waybetween-subjects ANOVA, with letter grades, were quite similar.

To examine the extent and nature of the multi-factor approachto grading, analysis was conducted using a complete between-subjects four-way ANOVA examining the interaction effects as well

Page 5: Examining the grading practices of teachers

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e13801376

as the effect sizes. These findings are presented in Table 3. Theresults reveal a statistically significant four-way interactionbetween achievement, ability, behavior, and effort (p < .0001).The practical significance of this interaction is supported by theeffect size (hp2 ¼ .07), suggesting that the statistically significantdifference is not due to the large sample size alone. The four-wayinteraction can be interpreted in different ways, but certainpatterns emerged that are worthy of comment. As such, themost interesting results have been highlighted in this paper. InFigs. 2ae4b the interactions between achievement and ability aredepicted controlling for both behavior and effort. Several inter-esting trends can be observed. For instance, Fig. 2a and b, wherethe interaction between achievement level and ability level isdisplayed within low effort and excellent behavior, show thatstudents with high achievement (B þ or 89%) on average receivea grade of A� (90%) regardless of ability level. This trend, however,does not hold true for students with low effort and inappropriatebehavior reported in Fig. 3a and b. Those students would notreceive the extra percentage point required to have A� status, butrather grades of 88.99% (low ability), 89.15% (average ability), and88.81% (high ability). In fact, the average assigned grades forstudents with low and high ability are actually slightly lower(88.99% & 88.81% respectively) than the reported achievementlevel of 89.00% in questionnaire’s scenarios. This data suggest thatbehavior, even more so than effort, is an important factor toteachers when dealing with borderline students.

Perhaps the most interesting finding is apparent in Fig. 4a and b.With high effort and excellent behavior, a student with lowachievement (reported as 69%) and low ability receives, on average,a grade of 76.80% (Cþ). It appears that teachers reward studentswith low ability tremendously when they both work hard andbehave in class. In fact, these figures suggest that, on average,students with both high effort and excellent behavior all receivea grade ‘boost’ regardless of ability or achievement level. Forinstance, students with low achievement (failing average) receive,at least a passing grade (70%). Furthermore, students with averageachievement (on the borderline of Cþ/B�) receive grades of B�;and high achieving students (on the B/A borderline) all receivegrades of A�.

The four-way interaction can also be illustrated by controllingfor ability and effort and examining the interaction betweenachievement and behavior, which is found in Figs. 5ae8b. An initialglance at this data reveals that, in general, regardless of ability or

Table 3Full four-way ANOVA with effect sizes.

Source of Variation DF Sum of

Ability 2 289.5Achievement 2 855 558Behavior 2 4006.3Effort 1 981.2Ability � Achievement 4 772.5Ability � Behavior 4 798.1Achievement � Behavior 4 1099.3Ability � Achievement � Behavior 8 1395.8Ability � Effort 2 376.3Achievement � Effort 2 573.4Ability � Achievement � Effort 4 540.5Behavior � Effort 2 784.2Ability � Behavior � Effort 4 563.6Achievement � Behavior � Effort 4 1465.5Achievement � Ability � Behavior � Effort 7 1267.4Error 18 634 16 287Total 18 687 12 072Corrected Total 18 686 1 131 8

See text for an explanation for the loss of a single degree of freedom in the examination*p < .0001.

effort levels, final grades increase as behavior improves. In Fig. 5aand b, the practical effect behavior has on the final grades isstrikingly clear. Despite low levels of effort, a student with lowability and low achievement, on average, will receive a passinggrade provided his/her behavior is average (final grade ¼ 70.01) orexcellent (final grade ¼ 70.11). Similarly, a student with averageachievement (on the Cþ/B�) borderline, will receive, on average,a B� provided she or he is believed to have excellent behavior. If thesame student is reported to have high levels of effort as well, she orhe will receive the B� even with average behavior (see Fig. 6a andb). Finally a student with low ability and low effort who is believedto be on the Bþ/A� borderlinewill receive the A� as long as his/herbehavior is excellent. Again, if the student is reported to have highlevels of effort instead of low levels of effort, she or he will receivethe A� with just average behavior. Under these circumstances, wesee the value of good behavior and effort in terms of final gradeassignments.

Similar trends occur when teachers believe students to be ofaverage ability as well. As illustrated in Fig. 7a and b, a student withaverage ability, low levels of effort and low achievement willreceive a passing grade as long as s/he has average or excellentbehavior. Under these circumstances effort, apparently, does littleto help a student’s grade. Even with reported high levels of effort,a student with low achievement and inappropriate behavior willstill receive failing grades (see Fig. 8a and b).

3. Discussion

Grades, when assigned appropriately (i.e. based on achievementmeasures only), they (a) enable teachers to compare the knowledgeand skills of current students, (b) allow teachers to ascertainaccurately the preparedness/readiness of incoming students, and(c) provide parents and students with a clear picture of each child’sknowledge and understanding of course content. Despite theirbenefits, Linn and Miller (2005) describe the grading and reportingof student progress as “one of the more frustrating aspects ofteaching” (Linn and Miller, p. 366). Thorndike (2005) describes thegrading process as unpleasant, time consuming, and anxietyprovoking. Nevertheless, the responsibility of the teacher is todetermine which specific factors will be used to determinea student’s final grade. Additionally, the classroom teacher mustdecide to what extent a variety of factors will be considered. Thisdecision-making process can be influenced by one’s personal

Squares F-Value P-Value Effect Size hp2

165.6 <.0001 .02.1 489 411.3 <.0001 .98

2291.8 <.0001 .201122.6 <.0001 .06221.0 <.0001 .05228.3 <.0001 .0531.4 <.0001 .06199.6 <.0001 .08215.3 <.0001 .02328.0 <.0001 .03154.6 <.0001 .03448.6 <.0001 .05161.2 <.0001 .03419.2 <.0001 .08207.1 <.0001 .07

847737

of the four-way interaction.

Page 6: Examining the grading practices of teachers

a Low Effort/Excellent Behavior

65.00

70.00

75.00

80.00

85.00

90.00

95.00

Low Average High

Low Average High

Achievement Level

Achievement Level

Fin

al G

rad

eF

in

al G

rad

e

Low AbilityAverage AbilityHigh Ability

70.05 79.20 89.95 High69.98 79.97 89.97 Average70.12 79.70 90.00 Low

b Low Effort/Excellent Behavior

0

1

2

3

4

Low Ability

AverageAbilityHigh Ability

D C A High D B A Average D B A Low Note: 0 = F, 1 = D, 2 = C, 3 = B, 4 = A

Fig. 2.

a Low Effort/Inapp. Behavior

65.00

70.00

75.00

80.00

85.00

90.00

95.00

Low Average High

Low Average High

Achievement Level

Achievement Level

Fin

al G

rad

e

Fin

al G

rad

e

Low AbilityAverage AbilityHigh Ability

69.79 79.05 88.81 High69.34 79.06 89.15 Average 69.27 79.06 88.99 Low

b Low Effort/Inapp. Behavior

0

1

2

3

4Low Ability

AverageAbilityHigh Ability

D C B High F C B Average F C B Low

Note: 0 =F, 1 = D, 2 = C, 3 = B, 4 = A

Fig. 3.

a High Effort/Excellent Behavior

65.00

70.00

75.00

80.00

85.00

90.00

95.00

Low Average HighAchievement Level

Low Average HighAchievement Level

Fin

al G

ra

de

Low AbilityAverage AbilityHigh Ability

* 79.95 89.99 High 70.08 80.24 89.74 Average 76.80 79.96 89.91 Low

b High Effort/Excellent Behavior

0

1

2

3

4

Fin

al G

ra

de Low Ability

AverageAbilityHigh Ability

* A High D B

BA Average

C B A Low

Note: 0 = F, 1 = D, 2 = C, 3 = B, 4 = A

Fig. 4.

a Low Ability and Low Effort

65

70

75

80

85

90

95

Low Average High

Low Average High

Fin

al G

ra

de

InappropriateBehaviorAverage Behavior

Excellent Behavior

70.11 79.70 90.00 Excellent 70.01 79.14 89.07 Average 69.27 79.06 88.99 Inappropriate

b Low Ability/Low Effort

0

1

2

3

4

Achievement Level

Fin

al G

ra

de

InappropriateBehaviorAverageBehaviorExcellentBehavior

D B A Excellent D C B Average F C B Inappropriate

Note: 0 = F, 1 = D, 2 = C, 3 = B, 4 = A

Fig. 5.

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e1380 1377

Page 7: Examining the grading practices of teachers

a Average Ability and High Effort

65

70

75

80

85

90

95

InappropriateBehaviorAverage Behavior

Excellent Behavior

69.98 80.24 89.74 Excellent 69.96 79.97 89.92 Average 69.35 79.07 89.06 Inappropriate

b

Low Average High

Low Average High

Average Ability/High Effort

0

1

2

3

4

Achievement LevelF

in

al G

ra

de

Fin

al G

ra

de

InappropriateBehaviorAverage Behavior

Excellent Behavior

D B A Excellent D B A Average

F C B Inappropriate Note: 0 = F, 1 = D, 2 = C, 3 = B, 4 = A

Fig. 8.

aAverage Ability and Low Effort

65

70

75

80

85

90

95

InappropriateBehaviorAverage Behavior

Excellent Behavior

69.98 79.97 89.97 Excellent 69.96 79.09 89.89 Average 69.34 79.06 89.15 Inappropriate

bAverage Ability/Low Effort

0

1

2

3

4

Low Average High

Low Average High

Achievement Level

Fin

al G

rad

eF

in

al G

rad

e

InappropriateBehaviorAverageBehaviorExcellentBehavior

D B A Excellent D C A Average F C B Inappropriate Note: 0 = F, 1 = D, 2 = C, 3 = B, 4 = A

Fig. 7.

a Low Ability/High Effort

65

70

75

80

85

90

95

Achievement Level

Fin

al G

ra

de

Fin

al G

ra

de

InappropriateBehaviorAverage Behavior

Excellent Behavior

76.80 79.97 89.91 Excellent 69.99 79.97 89.72 Average

b Low Ability/High Effort

0

1

2

3

4

Low Average High

Low Average High

Achievement Level

InappropriateBehaviorAverageBehaviorExcellentBehavior

C B A Excellent D B A Average

F C B Inappropriate Note: 0 = F, 1 = D, 2 = C, 3 = B, 4 = A

69.28 79.05 89.05 Inappropriate

Fig. 6.

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e13801378

philosophy of teaching and learning (Tomlinson, 2001), measure-ment training, local and official grading policies, as well asperceived, or actual, consequences of assigned grades.

Grades are becoming increasingly important in American publicand private schools. Because teachers and students (as well as theirparents) are well aware of their consequences, grades can becomethe focus of a great deal of tension between teachers and students.Many may argue that student knowledge and understanding ofcourse content cannot be summed up in a simple letter ornumerical grade. Some readersmay recall from their own schoolingexperiences being aware that an A grade (or equivalent grade ofexcellence) in one teacher’s class meant something completelydifferent than an A (or equivalent) in another teacher’s class. Whilethis research confirms that American teachers, to a large extent,tend to favor the use sound grading practices, it is clear that theyalso consider other circumstances. It appears that in the cases ofwhat are referred to as borderline grades, teachers relymore heavilyon other student characteristics. One might argue that these resultsmerely indicate that teachers are aware that even sound measuresof achievement are imprecise and contain some error and, as such,give students the benefit of the doubt. The fact that teachers in thisstudy were far less likely to give the benefit of the doubt to somestudents with poor behavior and limited motivation still calls intoquestion the meaning of grades.

Previous research indicates, however, that teachers are notdeliberately fueling this stakeholder conflict or being completelyunreasonable or arbitrary in their grading decisions. In most cases,teachers are basing their decisions on their personal beliefs andexpectationswhich tend to support and promote student success. Itis not hard to believe that individuals who have dedicated theirlives to helping children would find it difficult to fail a low abilitystudent who works hard and causes minimal classroom

Page 8: Examining the grading practices of teachers

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e1380 1379

disruptions. Such an action certainly has the potential to distressa teacher on both a professional and personal level. Brookhartwrites that even teachers who demonstrate that they know thebasic principles of grading still strugglewith actual implementationof those principles. Perhaps, if teachers could be insured that gradeswould only be interpreted as measures of achievement, they wouldbe more inclined to do so but “teachers know that grades are notonly interpreted, but they are used” (Brookhart,1991, p.35). Becauseteachers know that some students are paid by their parents forgood grades and punished for bad ones; and because teachersknow that students with bad grades cannot participate in theschool’s ice cream socials and kids with good grades get twoscoops; and because teachers know that so much of a child’s selfworth is tied into his/her grades, they simply cannot grade on thebasis of achievement only even when they are completing pen andpaper surveys on abstract, hypothetical students as included inthis study.

Teachers are told to ignore effort when assigning final grades,but most stakeholders know, or suspect, that educators eitherexplicitly (i.e. effort on a rubric) or implicitly (homeworkcompleted, class participation) assign grades including the effortcriteria. For many, effort becomes the deciding factor in borderlinecases. During the pilot of this questionnaire in a separate schooldistrict, one teacher wrote: “Effort matters, but not very much. Themain criterion for the grading is how much the course objectiveshave been mastered. I might give a passing grade instead ofa failing one to a student who makes the effort, but I would notchange the grade if there is no risk of failing just because of theeffort.” In the same study another teacher commented: “I rewardfor effort in class.” Continued policies that pretend such practicescan be eradicated or even reduced seem impractical at best.McMillan (2003) asserts knowing student effort and motivationare such large influences in teachers’ assessment practices, a well-understood, operationalized definition/system for assessing thesecharacteristics is needed. In this way, teachers can be trained inwhat to observe and how to interpret effort and motivation ina way that is both fair and accurate. At a minimum some coherentsystem of ‘effort evaluation’ must be considered because thecurrent, inevitable, idiosyncratic system of including effort callsinto question both the meaning and quality of grades.

Effort, nevertheless, is not the only non-achievement criteriateachers consider when assigning final grades. The data suggestthat student behavior is heavily considered by classroom teachers,even more so than effort, when considering borderline cases.A student with excellent behavior and low achievement is far morelikely to pass a course than a student with inappropriate behaviorand lowachievement; and these results tended to be consistent andpresent across all teachers. Frary, Cross, and Weber (1993) foundsimilar results in their study of secondary teachers. Thirty-onepercent of teachers surveyed indicated that they ‘agreed or tendedto agree’ that “laudatory or disruptive classroom behavior shouldbe considered in determining final grades.”

Perhaps most importantly, the reader should bear in mind thatthe school district from which these teachers were selected has anofficial grading policy that stresses achievement as the only factorto be considered in assigning final grades. This policy coupled withthe reality that teachers had no personal relationships with thestudents described in each item suggests that the observed inter-actions may be under-estimates of actual effects because ofteachers’ unwillingness to respond in a way that goes againstdistrict grading policy or the absence of the human attachment andemotion that accompanies personal relationships with students.

Although this study’s research design systematically examinedfour important factors considered when teachers assign finalgrades, further research using additional, or different, facets is also

necessary. The literature suggests the use of improvement(McMillan et al., 2002) as a guiding force when assigning studentgrades. Including the student improvement variable may alsoprovide critical knowledge about teacher grading practices.Tomlinson (2001) asserts that teachers should grade for successjust as they teach for success with grades reflecting not onlynormative grades, but personal growth. Teachers who assigngrades based, in large part, on a student’s progress are more con-cerned with how far students have come, in terms of achievement,as opposed to where they are. The improvement factor shouldcertainly receive further consideration in future studies of teachers’grading practices.

3.1. Limitations

Perhaps, the most important limitation in this study’s design isthe fact that, although the scenarios were independent, theresponses of teachers were not. While each teacher responded toa unique combination of independent variables once, the teachersresponded to 36 combinations within the questionnaire and thisseems to have created dependencies among the independentvariables represented within each scenario. These dependenciesled to singularity problems when examining the four-way inter-action that required the use of an additional degree of freedom(hence the resulting seven degrees of freedom instead of eight). Inessence, the design matrix representing the four independentvariables or facets was not of full rank.

One scenariowas inadvertently not placed on any questionnaire.Examination of the missing item, however, reveals that its non-inclusion is not a major limitation. This item described a studentwith high levels of ability and effort, excellent behavior, but lowlevels of achievement. Focus groups participants (and, perhaps,common sense) indicated that such a student would not exist ina real classroom situation. They argued that a student with highability whoworks hard and behaves would not be failing a course atthe end of the semester. One teacher described this particularscenario as ‘nonsensical’. Previous studies (Roskam & Broers, 1996)have used mapping sentences to create questionnaires, and theydeliberately removed one or more possible combinations in aneffort to maintain practical feasibility, improve linguistic phrasing,or eliminate meaningless items. In short, little valuable ‘informa-tion’ about the grading practices of teachers was lost by the inad-vertent deletion of this scenario from the data analysis.

Finally, in this study, teachers were asked to read 36 scenariosthat varied, by design, only slightly in the student characteristicsthey were asked to consider. Such a task may have becomeexhausting or even tedious formany participants. Furthermore, dueto the design participants, most assuredly, recognized the apparentpatterns among these vignettes. These limitations, however, did notappear to be overwhelming as most teachers who started thesurvey completed the questionnaire in full and provided answerswhich made common sense in light of the student being described,i.e. students with high achievement received higher grades.Furthermore, teachers who participated in the focus group discus-sions used to validate the vignettes/scenarios before the pilot studydid not complain about the questionnaire’s length or repetitiveness.

4. Conclusions

Pre-service and in-service teachers spend considerable timelearning how to teach. Teacher training programs offer a myriad ofcourses designed to improve teaching, and, as a consequence,student learning. This training typically includes, of course, someinstruction on the use and benefits of classroom and large scaleassessments. Our results suggest, however, that the issues related

Page 9: Examining the grading practices of teachers

J. Randall, G. Engelhard / Teaching and Teacher Education 26 (2010) 1372e13801380

to final grade assignments require more attention in U.S. teacher-education programs. This study has several implications for thetraining and development of classroom teachers. First, it providesuseful information about the grading inclinations of K-12 teachersto teacher-educators. Indeed, most classroom assessment/educa-tional measurement textbooks instruct teachers to use achieve-ment measures only when assigning students grades in all cases;yet many teachers when faced with a specific combination ofstudent attributes rely on other, less reliable measures such aseffort and behavior. Such practices critically diminish the meaningof grades and create conflict and confusion among stakeholders.These results suggest that in-service teachers require additionaltraining in appropriate grading practices for all students, and mostimportantly, the ultimate consequences of failing to implementthem. Teacher-educators should, of course, begin teaching byexample, e.g. removing participation points from syllabi, makingachievement-only criteria clear, and assigning final grades thatreflect student mastery of those criteria.

Finally, school and school system administrators can use thisinformation to inform choices for in-service professional develop-ment. As school administrators are typically tasked with mediatingconflicts between students, teachers, and parents including issuesrelated to final grades, they are particularly vested in the estab-lishment andmaintenance of a uniform grading systemwhich doesnot bias or disadvantage any student groups. By establishing andactively enforcing achievement-only grading policies at the schoollevel, administrators can create a school culture in which allstakeholders (parents, teachers, and students) can trust themeaning of grades for all students.

Appendix. Supplementary material

Supplementary data associated with this article can be found inthe online version at doi:10.1016/j.tate.2010.03.008.

References

Brookhart, S. (1991). Grading practices and validity. Educational Measurement: Issuesand Practice 35e36.

Brookhart, S. (1993). Teacher’s grading practices: meaning and values. Journal ofEducational Measurement, 30(2), 123e142.

Brookhart, S. (2004). Grading. Upper Saddle River, New Jersey: Pearson Education.Bursuck, W., Polloway, E., Plante, L., Epstein, M., Jayanthi, M., & McConeghy, J.

(1996). Report card grading and adaptations: a national survey of classroompractices. Exceptional Children, 62(4), 301e318.

Cicmanec, K., Johanson, G., & Howley, A. (2001, April). High school mathematicsteachers: Grading practice and pupil control ideology. Paper presented at theAnnual Meeting of the American Educational Research Association, Seattle,Washington.

Cizek, G., Fitzgerald, S., & Rachor, R. (1996). Teachers’ assessment practices:preparation, isolation, and the kitchen sink. Educational Assessment, 3(2),159e179.

Cizek, G., Robert, R. & Fitzgerald, S. (1995, April). Further investigation of teachers’assessment practices. Paper presented at the Annual Meeting of the AmericanEducational Research Association, San Francisco, CA.

Cross, L., & Frary, R. (1999). Hodgepodge grading: endorsed by students andteachers alike. Applied Measurement in Education, 12(1), 53e72.

Feldman, A., Alibrandi, M., & Kropf, A. (1998). Grading with points: the determi-nation of report card grades by high school science teachers. School Science andMathematics, 98(3), 140e148.

Frary, R., Cross, L., & Weber, L. (1992, April). Testing and grading practices andopinions in the nineties: 1890s or 1990s? Paper presented at the Annual Meetingof the National Council on Measurement in Education, San Francisco, CA.

Frary, R., Cross, L., & Weber, L. (1993). Testing and grading practices and opinions ofsecondary teachers of academic subjects: implications for instruction inmeasurement. Educational Measurement: Issues and Practice, 12(3), 23e30.

Green, S., & Salkind, N. (2005). Using SPSS for Windows and Macintosh: Analyzing andunderstanding data (4th ed.). Upper Saddle River, NJ: Prentice Hall.

Guttman, L. (1977). The mapping sentence for assessing values. In S. Levy (Ed.),Louis Guttman on theory and methodology: Selected writings (pp. 127e134).Vermont: Dartmouth Publishing Company.

Kahn, E. (2000). A case study of assessment in a grade 10 English course. Journal ofEducational Research, 93(5), 276e286.

Krueger, R. (1994). Focus groups: A practical guide for applied research. ThousandOaks, California: Sage Publications.

Linn, R., & Miller, M. (2005). Measurement and assessment in teaching. Upper SaddleRiver, NJ: Pearson Prentice Hall.

McMillan, J. (2003). Understanding and improving teachers’ classroom assessmentdecision making: implications for theory and practice. Educational Measure-ment: Issues and Practice 34e43.

McMillan, J., Myran, S., & Workman, D. (2002). Elementary teachers’ classroomassessment and grading practices. Journal of Educational Research, 95(4),203e213.

McMillan, J. & Nash, S. (2000, April). Teacher classroom assessment and gradingpractices decision making. Paper presented at the NCME annual meeting, NewOrleans, LA.

Rose, L. & Gallup, A. (2007). The 39th Annual Phi Delta Kappa/Gallup Poll of thePublic’s Attitude Towards the Public Schools. Retrieved from http://www.pdkintl.org/kappanon November 12, 2009.

Roskam, E. E., & Broers, N. (1996). Constructing questionnaires: an application offacet design and item response theory to the study of lonesomeness. InG. Engelhard, & M. Wilson (Eds.), Objective measurement: Theory into practice,Vol. 3 (pp. 349e385). Norwood: Ablex.

Stanley, G., & Baines, L. (2001). No more shopping for grades at B-mart: re-estab-lishing grades as indicators of academic performance. The Clearing House, 74(4),227e230.

Stiggins, R., Frisbie, D., & Griswold, P. (1989). Inside high school grading practices:building a research agenda. Educational Measurement: Issues and Practices, 8(2),5e11.

Thorndike, R. (2005). Measurement and evaluation in psychology and education(7th ed.). Upper Saddle River, New Jersey: Pearson Education, Inc.

Tomlinson, C. (2001). Grading for success. Educational Leadership. 12e15.