how finely grained does summative assessment need to be?

14
This article was downloaded by: [Swinburne University of Technology] On: 26 August 2014, At: 08:29 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Studies in Higher Education Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cshe20 How finely grained does summative assessment need to be? Mantz Yorke a a Department of Educational Research , Lancaster University , Lancaster, UK Published online: 18 Aug 2010. To cite this article: Mantz Yorke (2010) How finely grained does summative assessment need to be?, Studies in Higher Education, 35:6, 677-689, DOI: 10.1080/03075070903243118 To link to this article: http://dx.doi.org/10.1080/03075070903243118 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Upload: mantz

Post on 17-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How finely grained does summative assessment need to be?

This article was downloaded by: [Swinburne University of Technology]On: 26 August 2014, At: 08:29Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Studies in Higher EducationPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cshe20

How finely grained does summativeassessment need to be?Mantz Yorke aa Department of Educational Research , Lancaster University ,Lancaster, UKPublished online: 18 Aug 2010.

To cite this article: Mantz Yorke (2010) How finely grained does summative assessment need tobe?, Studies in Higher Education, 35:6, 677-689, DOI: 10.1080/03075070903243118

To link to this article: http://dx.doi.org/10.1080/03075070903243118

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: How finely grained does summative assessment need to be?

Studies in Higher EducationVol. 35, No. 6, September 2010, 677–689

ISSN 0307-5079 print/ISSN 1470-174X online© 2010 Society for Research into Higher EducationDOI: 10.1080/03075070903243118http://www.informaworld.com

How finely grained does summative assessment need to be?

Mantz Yorke*

Department of Educational Research, Lancaster University, Lancaster, UKTaylor and Francis LtdCSHE_A_424485.sgm10.1080/03075070903243118Studies in Higher Education0307-5079 (print)/1470-174X (online)Original Article2010Society for Research into Higher [email protected]

Assessors in higher education are often faced with the need to grade student workon lengthy scales. Is such fine granularity in assessment really necessary? Thequestion can be addressed at different levels of the assessment system: here thefocus is on the difference that would be made to honours degree classifications ifso-called percentage grades were replaced by grades on a very much shorter scale.Detailed analysis of the complete assessment records of 144 Law students on amodular scheme in a university in the UK showed that the difference was notlarge, and derived mainly from the way in which sub-modular grades werecombined. Some implications of the findings are discussed, which have relevancebeyond higher education in the UK.

Keywords: assessment; grading system; outcomes; degree performance; honoursclassification

The challenges of assessment

Assessment has for some time been a challenging issue in higher education around theworld. In the UK, reports by the Quality Assurance Agency (QAA), and more recentlyresults from the National Student Survey (NSS), have shown consistently thatassessment is the aspect of a generally well-regarded higher education experience thatstands most in need of development. The thrust of these findings has been focused onformative assessment.

Knight (2002) made a strongly-argued case that summative assessment was indisarray, not least because more expectations were placed on it than the assessmentmethodology could bear. As he put it:

In the Nicomachean Ethics, Aristotle advises us not to expect more precision than thesubject admits of, which is a good precept to apply to summative assessment. In manyways it [summative assessment] cannot deliver the precision and certainty that manage-rialist discourses and common sense expect. (284)

However, summative assessment has received relatively little attention, despite itsimportance for students and their futures. More problematic than many are preparedto acknowledge are:

● the grading of students’ work (see, for example, Milton, Pollio, and Eison 1986;Yorke 2008), which is influenced by a variety of factors including normativedisciplinary practices;

*Email: [email protected]

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 3: How finely grained does summative assessment need to be?

678 M. Yorke

● the way in which grades are cumulated to produce an overall index of attain-ment, such as the honours degree classification in the UK, and the grade-pointaverage (GPA) in the USA, when it is used as an overall index of achievementon a programme of study (Yorke 2008); and, less visibly,

● the assessment regulations implemented by autonomous institutions (forevidence of variation in the UK, see Yorke et al. 2008; and in the USA, seeBrumfield 2004).

One of the issues facing assessors in the UK, and no doubt elsewhere, is to ensure thatgrading gives a fair representation of student achievement. This is a complex matter,since it has to take account of, inter alia, variation in the kinds of demand made ofstudents, and the way that varied demands are brought together within a single pieceof submitted work. Fairness in grading is an exacting challenge. The challenge isheightened when the concept of fairness has to apply across disciplines with variedgrading traditions which give rise to differing profiles of awarded grades.

Where finely-grained scales are used for summative assessment, it is doubtful thatthe discriminations made by assessors are actually as finely judged as is implied bythe scale, and as are understood by the recipients and other interested parties. Mucheffort is expended on determining grades, particularly with a view to the implicationsof the grades for overall programme grades (and their consequences for students), butit is questionable whether the level of effort makes conceptual and practical sense. Atrue-life example of misdirected effort in assessment is the lengthy discussion byassessors of a presentation by a group of students as to whether it merited a mark of63% or 64%: since the assignment counted for only one-fifth of the marks for themodule as a whole, the assessors were haggling over the equivalent of 0.2 of a percent-age point per module, or roughly 0.01 of a percentage point for the programme overall(the term ‘percentage scale’ is in common use for grades that run from 0 to 100, andsuch grades are treated as if they had the mathematical properties of true percentages:however, when one asks the question ‘percentage of what?’, the validity of the gradeas a meaningful percentage collapses).

Would the employment of a more coarsely-grained assessment scale deliverroughly the same pattern of results at overall programme level as that derived frommore finely-grained assessments?

Methods of determining the honours degree classification in the UK

Before seeking to answer this question, it is appropriate to provide a summary of theway in which the UK honours classification is determined. In the UK (apart fromScotland) it is typically the case that full-time students have merely to pass their first-year studies in order to progress to what, in some institutions, is called ‘Part 2’ of theundergraduate curriculum. The honours degree classification is usually based onresults from the second and final year of academic study (i.e. Part 2): for part-timestudents the same principle applies, though the time-scales involved are necessarilylonger. Where the programme involves a year’s placement in an organisation, theachievements typically contribute little or nothing to the actual classification (thoughthey may have advantages for the student when seeking employment).

The majority of institutions in the UK uses grades in the form of (what aretypically called) percentage marks. These normally map on to the honours degree clas-sification via mean percentages as follows:

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 4: How finely grained does summative assessment need to be?

Studies in Higher Education 679

70.0% and above: first class honours60.0 to 69.9%: upper second class honours50.0 to 59.9%: lower second class honours40.0 to 49.9%: third class honours.

A minority of institutions use grade-scales considerably shorter than the so-calledpercentage scale, and determine the classification according to the ‘profile’ ofawarded grades. Some institutions from the majority group have a secondary methodof determining the class of honours. If the mean percentage falls just below an honoursdegree classification boundary (and the definition of ‘just below’ varies between insti-tutions – see Yorke et al. 2008), then a ‘profiling’ approach is invoked.

Frugal summative assessment?

The present article was stimulated by a reading of the book Simple heuristics thatmake us smart (Gigerenzer, Todd, and the ABC Research Group 1999). The generaltheme of the book could be summed up in the rather hackneyed phrase ‘less is more’,in that the authors point to examples in which an appeal to compendious informationdelivered a less satisfactory outcome than the use of a small number of relevantparameters. Gigerenzer et al. argue the virtues of stripped-down choice-makingprocesses (heuristics) which they term ‘fast and frugal’. This article explores someaspects of what might be termed the ‘frugal summative assessment’ of student work.

Three main assessment contexts are as follows.

(1) Assessment for learning, in which the primary purpose of the assessment is toencourage the student to a higher level of achievement, whether this be theresubmission of a failed task or the improved tackling of some future task.Here, relatively rough and ready assessment may be all that is needed: in otherwords, the assessment has to be adequate for the purpose (student learning)without necessarily reaching the level of robustness that is generally taken tobe necessary for high-stakes summative assessment. Learning should ideallyinvolve the student developing an appreciation of standards: Sadler (2009)offers an extended argument on this point.

(2) Summative assessment of individual pieces of work, where such assessmentsare subsequently combined to produce an overall mark – say, for a module ina programme of study.

(3) The combination of assessment components into a grade for the programme asa whole, often via the intermediate step of module marks. Examples ofprogramme grades are the GPA used in the USA, and the honours degreeclassification used in the UK and Australia.

Contexts 2 and 3 interlock. This article emphasises Context 3, and deals incidentallywith some aspects of Context 2. It does not attempt to address Context 1. Specifically,the research question addressed here is as follows:

How much difference would grading by broad categories, in contrast to apparentlyprecise grading, make to a student’s overall programme grade?

The research question is addressed retrospectively, by examining a set of grades builtup in three stages: sub-modular assessments; combination of intra-modular assessments

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 5: How finely grained does summative assessment need to be?

680 M. Yorke

to produce an overall module grade; and the combination of overall module grades toproduce an overall programme grade. Whereas the study reported in this article wasundertaken with reference to the honours degree classification in the UK, it is relevantto other methods of determining an overall programme grade such as the GPA.

How assessors would grade work if they used a fast and frugal approach to gradingfrom the outset is a matter that cannot be addressed through the analysis of retrospec-tive data, and would require a different kind of study. Would they mark in a differentway if they only had broad grades to work to? Some might mark in percentages andconvert to broad grades, whereas others might find marking to broad grades rather likethe first stage of the approach they have traditionally adopted – to assess a piece ofwork generally and subsequently to refine the grading within a broad band. The choiceof grading method could have a significant effect on the results profile of somestudents, particularly those in the middle of the results distribution (however work isgraded, the generally strong student will end up towards the top of the order of merit,and the generally weak towards the bottom).

Method

A data set was made available from a post-1992 university in the UK, whichcomprised the full results from 152 students for whom the primary subject of studywas Law. The data set included records where students had failed assessments (eitherby not taking them at all or by achieving a failing grade) as well as the records ofsuccesses. In the analyses reported here, failures have been ignored wherever thestudent subsequently redeemed the performance. Eight sets of records were discardedbecause there were insufficient module results in the data set to compute an honoursdegree classification: the probability is that these students were admitted with‘advanced standing’ from another institution and had been allowed credit in respect ofprevious achievements.

Since the focus of the study was on the honours degree classification achieved bythe remaining 144 students, only the 2863 module results that were eligible for consid-eration when determining their classifications were incorporated in the analysis. Themean number of modules studied per student, for the second and final year combined,is therefore 19.9 – well in excess of the 16 module passes necessary for the award ofthe degree with honours. Examination of the data indicated that many students had a‘tail’ of poor – even failing – achievements to their name which were not counted inthe honours degree classification process. This amount of ‘tailing’ of performances isprobably untypical of UK higher education. It implicitly opens up the question of theextent to which students should be allowed to replace weak performances by takingadditional modules.

The vast majority of modules studied were on some aspect of law, but there werea few modules from other subject areas such as business studies and the socialsciences. The normal tariff of credits per module was 15, with 360 credits beingrequired to obtain an honours degree. Of these 360 credits, the first 120 merely qual-ified the student to enter ‘Part 2’ of the degree programme, in which they wererequired to obtain a minimum of 240 credits for the award of the degree withhonours (though the honours class was based on the best 225 credits obtained).Under the assessment regulations current at the time in this university, the studentwas required to have passed a minimum of 14 modules in Part 2, and to haveachieved a mark of at least 25% in the remaining one or two modules required to

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 6: How finely grained does summative assessment need to be?

Studies in Higher Education 681

make up a total of 16 (in other words, some compensation between module perfor-mances was permitted).

The provided data were, for the majority of students, percentage marks for eachassessment component. The percentages were also converted by the university intogrades, since it had a dual method of determining the classification – first with referenceto the mean percentage mark, but if the mean fell no more than 3 percentage pointsbelow a classification borderline then the classification could be determined accordingto the profile of grades awarded in respect of the modules passed. The conversion ofpercentages to grades was untypical of UK higher education in the region just abovea passing percentage. Whereas clear passes at 50% and above were graded as A = 70%+,B = 60–69%; C = 50–59% (which is typical), D was assigned to performances between43 and 49% (but, curiously, also to some in the band 40–42%), and E was assigned topassing performances in the range 40–42%. Failing performances (i.e. below 40%)could be compensated by superior performances elsewhere, or might be redeemed byretaking one or more assessments for the failed module. Retaken assessment passeswere not ‘capped’ at 40%: since these data were collected, the university has amendedits regulations so as to cap retaken assessment passes at this level.

Thirty-six of the 144 students had begun their programmes in a different institutionprior to a merger, under assessment regulations in which performances were gradedon a scale running from 0 to 16 rather than in terms of percentages. After the merger,the grade-points were converted into percentages in order to align the performanceswith the regulations obtaining after the merger had taken place. In addition, across the144 sets of records there were 24 instances of modules having been passed at post-graduate level, with the pass being accompanied by a percentage mark. For thepurposes of the present article, the percentage marks were used, irrespective of theirorigin, since their origin was irrelevant to the research question being investigated.

The provided percentages at sub-module level (or when the module had only oneassessment, the module level) were converted for the purposes of the present studyinto six bands which would not be untypical of higher education in the UK. The fourpassing grades, A to D, with their associated percentage ranges will be understoodas corresponding to the four classes of honours used in the UK (first; upper second;lower second; third): A 70% and above; B 60–69%; C 50–59%; D 40–49%; FX 35–39% (implying that the narrow failure could be compensated): F below 35%,uncompensatable fail.

Whilst the categories of passing performances used in the present article are akinto those of the honours degree classification, there is no particular reason why themapping rules should be based upon these particular grade categories. For example, afour-category approach (distinction/merit/pass/fail) could be used, with the classifica-tion of honours being determined according to the profile of categories achieved. TheEuropean Credit Transfer and Accumulation System (ECTS) uses five bands of pass(European Commission 2009), but can accommodate a variety of national assessmentbandings when transnational credit transfer is required. The fewer the number of cate-gories, the less complex the mapping rules need to be (see Dalziel 1998).

Mapping rules used in this study

(a) Module components to overall module

The progressive combination of grades from sub-module level to overall programmelevel necessitated the implementation of ‘mapping rules’. There is no difficulty in

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 7: How finely grained does summative assessment need to be?

682 M. Yorke

mapping a run of C grades into an overall grade of C, but when – as was evident inthe provided data set – a range of grades from outright fail to A has to be mapped, themapping task becomes obviously challenging.

There is no absolute ‘right way’ of undertaking mapping. Mapping reflects thevalues of those devising the assessment regulations. Mapping rules have to be defen-sible for reasons that include transparency and the pre-emption of vexatious studentappeals. In the present study the following mapping rules were adopted. Grades frommodule components were mapped on to a 5-point scale for the module as a whole,with passing grades A to D as at the component level, but with the fifth grade (F)being an unambiguous fail (the dropping of the FX grade for this stage of the analy-sis signals that compensation between whole-module performances was deemedunacceptable). Marks from retaken assessments were treated as valid, and not‘capped’ at 40% (for demonstration purposes, the decision is not of the importancethat it would have in real situations, where a philosophical rationale regardingcapping is needed).

The mapping rules for combining sub-module grades to give a grade for the wholemodule were as follows.

(1) Compensation was limited to marks in the range 35–39%(2) – except where an assessment counted for no more than one-tenth of the

assessment load for the whole module: in this case, failure in only one suchassessment per module was tolerated, and compensation was applied irrespec-tive of the percentage awarded.

(3) An uncompensatable fail in more than 10% of the total module was taken as afailure of the whole module. The redemption of such a failure is outside thescope of this article, and would be laid down in institutional assessment regu-lations: the student might be expected to retake the whole module, or merelythat part of the module in which there had been a failure.

(4) Where the weighting ratio between the (only) two components of a modulewas 70/30 or higher, and there was only a difference of one grade, the morehighly-weighted grade was ‘awarded’.

(5) Where the weighting ratio between the (only) two components of a modulewas smaller than 70/30, and there was only a difference of one grade, the lowergrade was ‘awarded’ on the grounds that the performance as a whole did notmerit the higher grade. If the grade combination was D/FX, then the outcomewas treated as a fail (on the grounds of insufficiently good performance tocompensate for the narrow fail).

(6) Where the weighting ratio between the (only) two components of a modulewas smaller than 70/30, and there was a ‘gap’ between the component grades(e.g. B and D), the intermediate grade (e.g. C) was ‘awarded’.

The above rules do not cover situations in which a module performance is built upfrom more than two components. Where the components attracted the same grade,the module weightings were combined, in effect reducing some modules to two oreven one component, thus allowing the mapping rules outlined above to be applied.Thus module weightings of 60/20/20, with corresponding grades B/C/C would betreated as 60/40 with respective grades B/C, giving C overall (mapping rule 5 aboveapplies).

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 8: How finely grained does summative assessment need to be?

Studies in Higher Education 683

Awkward combinations in the data set included the examples indicated in Table 1,with overall module grades ‘awarded’ as shown (where it seemed possible to ‘award’either of two grades, the lower was chosen).

(b) Module grades to overall programme grade

The combination of module grades into an overall programme grade was achievedusing a ‘profiling’ approach similar to that employed in a number of UK institutions.

Since the data set related to the award of the degree, with or without honours, theassumption was made that all the students had gained 120 credits from first-year study(or its part-time equivalent). As is customary in the UK, it was taken as necessary forthe award of the degree with honours that the student gained 240 credits in the finaltwo years of full-time study (or its equivalent), giving a credit total of 360. As isevident from the ‘results’ that follow, some students did not obtain ‘passes’ totalling240 credits.

For the determination of the class of honours in this study, 210 of the final 240credits were counted – i.e. the grades from fourteen of the sixteen 15-credit modules.This allowed the student to avoid being penalised for one or two untypically poor butpassing module performances which might be due to unsuccessful risk-taking inrespect of their work or to adventitious misfortune (note that in this study noallowance was made as regards compensation for failure to achieve a passing grade atthe module level).

The degree was ‘awarded’ without honours if the amount of credit gained in Part2 of the undergraduate programme fell short of 240 (no students emerged with solittle credit that they would have ‘failed’ the degree). With the 14 results arranged indescending order, the grade ‘awarded’ in respect of the eighth module was taken asthe principal signifier of the overall programme grade. Where two or more moduleresults were two or more grades away from the signifier grade, the signifier gradewas taken as the seventh (when the discrepancy was of higher grades) or the ninth(when the discrepancy was of lower grades). There were no examples of discrepan-cies in both directions: if there had been such, the signifier grade would haveremained the eighth.

Table 1. ‘Awards’ from awkward combinations of grades.

Module weightings Respective grades Grade ‘awarded’

60/10/30 D/C/B C60/30/10 C/D/F D60/30/10 C/A/A B70/30 D/A C20/25/25/30 A/A/C/B B45/45/10 C/C/A C90/10 B/FX C60/10/30 C/B/A B30/70 FX/A C25/25/50 A/A/F F50/50 A/D C70/30 B/FX D

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 9: How finely grained does summative assessment need to be?

684 M. Yorke

Table 2 gives some examples illustrating the mechanism for determining the over-all programme grade from 14 ‘counting’ grades (the two lowest of the original 16having been discarded).

Results

The original data showed that, in a number of cases, the honours degree classificationactually awarded was higher than it would have been if the only criterion had been themean percentage gained over the whole programme (Figure 1). Details of the proceed-ings of the relevant examination board were not available, but it is likely that many ofthese upgradings were due to the invocation of the secondary ‘profiling’ approach todetermining the classification where the mean percentage was just below a degree

Table 2. Examples of the determination of the overall programme grade.

Frequency of module grade

A B C D F ‘Awarded’ grade

2 1 3 8 0 C0 4 3 6 1 Degree without honours (fewer than 14 modules counting)0 6 6 0 2 Degree without honours (fewer than 14 modules counting)8 6 0 0 0 A1 3 4 6 0 C2 6 3 3 0 C3 5 4 2 0 C

Figure 1. Mean programme percentages for 144 students, in ascending order.

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 10: How finely grained does summative assessment need to be?

Studies in Higher Education 685

classification boundary. There may have been circumstances pertaining to perfor-mance of some students which led the examination board to upgrade the classification:the data-point in the region of 57% is one such example since this student’s award wasdetermined on the basis of a previous set of assessment regulations.Figure 1. Mean programme percentages for 144 students, in ascending order.Data-points represented by black squares indicate that these students were awarded a classification higher than that which the mean percentage alone would justify.As a side-issue, it is worth noting that no mean programme percentage in the ‘real-life’ results (based on the best 15 results out of 16) fell below 45%. Since modulepercentages had to be no less than 40% if credit were to be awarded (unless compen-sation were to be invoked, as it was in 45 cases), there would be a good chance thatthe profile of 15 results counting for classification purposes would contain enoughmodule percentages sufficiently in excess of 40% to bring the mean well above thatwhich would just scrape into the honours category. Considerations such as this mayaccount for the decrease over time in the proportion of third-class honours degrees.

The cross-tabulation of the ‘awards’ made in this study against the actual awardsmade by the examination board shows that 116 of the 144 outcomes (80%) were thesame in both cases (Table 3). However, there was a modest tendency for the ‘awards’to be lower.

An examination of the way in which the overall programme grade was reachedshowed that the bulk of the differences resided in the combination of module compo-nent marks to produce an overall module mark. The rules adopted for this study turnedout to be more stringent than those employed by the institution, particularly in respectof compensation for percentages below 40% (the level of a bare pass). Some studentswho were officially awarded a pass for the module were denied a pass in this studybecause they had performed insufficiently well according to the mapping rulesadopted here. This failure (in respect of one or more modules) then carried forwardinto the honours degree classification, where insufficient credit was gained overall topermit the ‘award’ of the degree with honours. As a consequence, the only ‘award’permissible was the degree without honours.

When the rules used in this study were relaxed in order to allow compensationwithin modules for marks in the range 30–39% (previously 35–39%, save for theexception noted earlier), and to allow a student to compensate for failure in as muchas half of the module’s assessment tariff (instead of requiring passing grades in morethan half of the tariff), the correlation between the ‘awards’ and the actual awards wasa little greater, with 120 of the 144 outcomes (83%) being the same (Table 4). For thereasons advanced previously, 10 students were ‘awarded’ the degree without honourswhereas in reality they had obtained an honours degree.

Table 3. Cross-tabulation of ‘awards’ determined by this study and actual awards (in theprovided data set there were no awards of the degree without honours).

Classification actually awarded

Classification ‘awarded’ FirstUpper second

Lower second Third

Degree, no honours Total

First 2 0 0 0 0 2Upper second 0 28 0 0 0 28Lower second 0 5 74 1 0 80Third 0 0 6 12 0 18Degree, no honours 0 2 11 3 0 16Total 2 35 91 16 0 144

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 11: How finely grained does summative assessment need to be?

686 M. Yorke

Descriptive statistics for the modules for which there were 30 or more results inthe provided data set show considerable variation within the broad subject area (Table5). The mean percentage gained per module covers a range of more than 14 percentagepoints. Very few marks fell below the pass level of 40%, and only one fell below 25%(the lower limit for compensation), and that by a large margin: the upper limit of themodules’ mark ranges was 84%. Only three marks in the selected modules exceeded80%.

The variation in distribution of the marks awarded in three of the more widelytaken modules (in which there happened to be no failing grades) is illustrated in Figure2. There is only a partial overlap between the three sets of student performancesbecause students could make selections from the modules on offer. The marks forModule AF are bunched towards the lower passing grades whereas those for ModuleBF are concentrated above the 60% level. The mark distribution for Module AK isintermediate between those for Modules AF and BF. The differences between thethree module grade profiles reflect the influence of an unknown combination of vari-ables including the students themselves, the degree of difficulty of the subject content,the teaching and the assessment demand.Figure 2. Contrasting mark distributions for three selected modules.

Answering the research question

The evidence from this empirical study suggests that the answer to the researchquestion How much difference would grading by broad categories, in contrast toapparently precise grading, make to a student’s overall programme grade? is ‘not alot’, and, where there are differences, they seem to be related more to the combina-tion of sub-module grades into a grade for the module as a whole than they are tothe determination of an overall grade for the programme. Whether this conclusionwould hold true for data sets derived from institutions with different assessmentregulations is a matter for further research. The study suggests that one segment ofthe ‘frugal summative assessment’ suggestion – the use of broad grades in deter-mining an overall grade for a programme – is potentially viable. This is, however,arguably the easier segment. Research is needed to establish the viability of fastand frugal grading practices. If such practices were to gain empirical support,attention would need to be given to the implications of extending them through thehigher education system – not least, regarding the development of assessors’expertise.

Table 4. Cross-tabulation of ‘awards’ determined by this study and actual awards, with somerelaxation of the rules for determining the overall module grade.

Classification actually awarded

Classification ‘awarded’ FirstUpper second

Lower second Third

Degree, no honours Total

First 2 0 0 0 0 2Upper second 0 28 0 0 0 28Lower second 0 5 78 3 0 86Third 0 0 6 12 0 18Degree, no honours 0 2 7 1 0 10Total 2 35 91 16 0 144

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 12: How finely grained does summative assessment need to be?

Studies in Higher Education 687

Yorke et al. (2008) showed that assessment regulations varied between UK insti-tutions in a number of respects, as did Brumfield (2004) for US institutions. Thediscrepancies between the classifications ‘awarded’ in this study and the real awardsillustrate yet again the point that the regulations that are applied to summativeassessments can have a powerful influence on the determination of honours degreeclassifications.

A question that cannot be answered on the evidence available is whether a markof, say, 60% represents an equivalent level of performance across all modules. The

Table 5. Descriptive statistics for the modules that were taken by more than 30 studentsstudying Law, sorted by descending mean mark. The dataset does not necessarily include allstudents who took the modules, since some students may have taken the modules from outsidea Law programme. The statistics for Module AQ have been heavily influenced by an outliermark of 6%.

Module code Mean n SD Kurtosis Skewness

BF 62.99 98 8.24 −0.06 −0.39AR 61.30 70 6.54 −1.00 −0.26BD 60.09 34 7.21 −0.79 0.54AC 59.68 38 6.70 0.50 0.05AY 59.66 64 5.16 0.78 −0.11BC 57.46 35 8.98 −1.01 −0.12BE 56.94 31 12.25 0.07 −0.78AX 56.89 85 7.83 −0.83 0.02AK 56.65 122 8.41 −0.84 −0.24AE 56.39 72 10.41 −1.00 0.02AP 56.22 60 7.86 −0.97 −0.13AI 56.09 96 8.57 −0.55 −0.01AN 54.92 59 8.50 −0.51 0.19AT 54.68 77 5.87 −0.36 −0.11AJ 54.66 82 8.24 −0.63 0.25AO 54.54 46 10.22 −1.23 0.11AU 54.34 53 6.75 −0.26 0.00AD 54.23 88 9.43 −0.61 0.19AL 53.85 80 7.02 −0.12 0.04AA 53.81 31 6.50 −0.31 0.45AW 53.51 37 8.56 1.61 1.01AM 52.91 88 8.06 −0.55 0.31BA 52.82 71 7.62 −0.38 −0.15AV 52.82 76 9.04 −0.91 0.19AS 52.47 32 8.03 −1.07 −0.27AZ 51.03 39 9.11 0.83 0.01BB 50.70 37 8.09 −0.50 0.64AF 50.14 72 8.16 −0.04 0.82AG 50.11 45 7.69 −0.04 0.77AQ 49.64 59 8.57 10.74 −1.94AH 49.49 63 8.61 −0.03 0.66AB 48.85 73 7.34 −0.98 0.36

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 13: How finely grained does summative assessment need to be?

688 M. Yorke

same question applies in principle to the normal practice across institutions ofequating 60% to upper second class honours irrespective of the nature of theprogramme, or in assuming that a GPA of, say, 3.25 carries a universal meaning. Italso applies when the marks are converted into a smaller number of letter grades. Itmay be the case that some modules are intrinsically ‘easier’ than others, with‘easier’ applying to the tasks set for the students and/or the way in which studentwork is graded.

Figure 2. Contrasting mark distributions for three selected modules.

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014

Page 14: How finely grained does summative assessment need to be?

Studies in Higher Education 689

Realpolitik

A strong, evidence-based case can be made that the production of overall grades inrespect of achievements across a whole programme is an exercise with little tocommend it (Yorke 2008). In summary, it conflates grades awarded under varyingimplicit metrics for a range of kinds of achievement. The award of the degree withhonours does not need to be overlain with a classification system, since it is feasibleto buttress the honours degree instead with a profile of transcripts which can indicatethe strengths of performances in the component parts of the programme: this may bequalitative and/or quantitative. A similar argument can be deployed in respect of thegrade-point average.

Yet, despite the evidence, the current (and rather timid) position in the UK is toretain the honours classification at least for the time being, whilst introducingalongside it the Higher Education Achievement Report (HEAR) which will recordperformances on programme components (Universities UK and GuildHE 2007). Eigh-teen varied institutions are currently subjecting the HEAR to trials.

If the honours classification is to remain, then it is desirable that some of its moreobvious inadequacies be mitigated (elimination is probably too demanding an aim).Coarsely-grained summative assessment will not solve all the difficulties, but mighthelp to generate a compromise that can be defended with greater persuasiveness thancan the general schemata of current summative assessment practices.

AcknowledgementsI am grateful to Graham Taylor-Russell and Harvey Woolf for their help with this study,though responsibility for the content of this article is mine alone. I am also grateful for the veryconstructive comments from anonymous referees on an earlier draft.

ReferencesBrumfield, C. 2004. Current trends in grades and grading practices in higher education:

Results of the 2004 AACRAO survey. Washington, DC: American Association ofCollegiate Registrars and Admissions Officers.

Dalziel, J. 1998. Using marks to assess student performance: Some problems and alternatives.Assessment and Evaluation in Higher Education 23, no. 4: 351–66.

European Commission. 2009. ECTS users’ guide. Brussels: European Commission. http://ec.europa.eu/education/lifelong-learning-policy/doc/ects/guide_en.pdf (accessed July 14,2009).

Gigerenzer, G., P. Todd, and the ABC Research Group. 1999. Simple heuristics that make ussmart. Oxford: Oxford University Press.

Knight, P.T. 2002. Summative assessment in higher education: Practices in disarray. Studiesin Higher Education 27, no. 3: 275–86.

Milton, O., H.R. Pollio, and J. Eison. 1986. Making sense of college grades. San Francisco:Jossey-Bass.

Sadler, D.R. 2009. Transforming holistic assessment and grading into a vehicle for learning.In Assessment, learning and judgement in higher education, ed. G. Joughin, 45–63.Dordrecht: Springer.

Universities UK and GuildHE. 2007. Beyond the honours degree classification: The BurgessGroup final report. London: Universities UK and GuildHE.

Yorke, M. 2008. Grading student achievement in higher education: Signals and shortcom-ings. Abingdon: Routledge.

Yorke, M., H. Woolf, M. Stowell, R. Allen, C. Haines, M. Redding, D. Scurry, G. Taylor-Russell, W. Turnbull, and L. Walker. 2008. Enigmatic variations: Honours degreeassessment regulations in the UK. Higher Education Quarterly 62, no. 3: 157–80.

Dow

nloa

ded

by [

Swin

burn

e U

nive

rsity

of

Tec

hnol

ogy]

at 0

8:29

26

Aug

ust 2

014