workplace-based assessments in psychiatry: evaluation of a whole assessment system

Original Article

Workplace-Based Assessments in Psychiatry:Evaluation of a Whole Assessment System

Andrew Brittlebank, FRCPsych, Julian Archer, MRCPCH, Ph.D.

Damien Longson, FRCPsych, Amit Malik, M.B.A., MRCPsych

Dinesh K. Bhugra, Ph.D., FRCP, FRCPE, FRCPsych

Objective: WorkPlace-BasedAssessments (WPBAs)were introducedinto psychiatry along with the new curriculum in 2005. The RoyalCollege of Psychiatrists decided to pilot several WPBAs to ascertaintheir suitability.

Method: Eight types of assessments (Case-Based Discussion,Assessment of Clinical Expertise, Mini-Assessed Clinical Encounter,Mini-Peer Assessment Tool, Direct Observations of ProceduralSkills, Patient Satisfaction Questionnaires, Case Conference, andJournal Club Presentation) were piloted, either singly or incombination, on 16 sites, with 600 psychiatric trainees.

Results: Consultant psychiatrists carried out most of the assess-ments. Case-Based Discussion was the most popular, and highlevels of correlation were obtained across several assessmenttools.

Conclusion: There is evidence that with suitable training ofassessors and trainees, WPBAs can be introduced and are feasiblein assessing some competencies.

Academic Psychiatry 2013; 37:301–307

Recently, several educational changes have been intro-duced to postgraduate training in medicine and its spe-

cialties in the United Kingdom (U.K.). A major componentof these changes has been the introduction of Work Place-Based Assessments (WPBAs) to assess trainees in theirplace of work in different domains and competencies as laid

out in the approved curriculum. These assessments are inaddition to national examinations, but are used for advanc-ing to the next year of training.

The Royal College of Psychiatrists of the U.K. is the mainbody responsible for training standards and curriculum de-velopment, and it works closely with the regulatory body(General Medical Council) that maintains the register ofspecialists. The College is required to give approval to jointhis register. The College set up this study to evaluate howindividual assessment tools can be used to build an efficientassessment program.

The aim of this article is to describe the use of theseassessments and lessons learned in clinical settings.

Method

Workplace-based assessments were first introduced in2005 for the 2-year Foundation Training program, whichdoctors must undertake upon completion of medical schooltraining. These tools used in the Foundation Program (1)were then blueprinted against the Specialty Curriculum forPsychiatry Training. However, these tools did not achievesatisfactory coverage of the curriculum (2), and it wastherefore decided to modify them and develop others so thatadequate coverage of competencies embedded in the cur-riculum could be assessed.

ToolsEight tools were developed for use in the Royal College

of PsychiatristsWPBAprogram: theCase-basedDiscussion(CbD), Assessment of Clinical Expertise (ACE), mini-Assessed Clinical Encounter (mini-ACE), Direct Observa-tion of Procedural Skills (DOPS), mini- Peer AssessmentTool (mini-PAT), Patient Satisfaction Questionnaire (PSQ),Case Conference (CP), and Journal Club Presentation(JCP).

Received November 11, 2011; revised February 23,May 16, 2012; January10, May 7, 2013; accepted May 9, 2013. From the Northumberland Tyneand Wear NHS Foundation Trust, St. Nicholas Hospital, Newcastle-uponTyne, U.K. (AB), Peninsula College of Medicine & Dentistry, Plymouth,Devon, U.K. (JA), NorthWestern Deanery, School of Psychiatry (DL),Royal College of Psychiatrists, London,U.K. (AM), Institute of Psychiatry,HSRD, London, U.K. (DKB). Send correspondence to Dr. Bhugra; e-mail:[email protected]

Copyright © 2013 Academic Psychiatry

Academic Psychiatry, 37:5, September-October 2013 http://ap.psychiatryonline.org 301

The Case-based Discussion is similar to the Chart-Stimulated Recall, in which the assessor conducts an in-terview of a trainee based on what they have written ina patient’s clinical records. There are few studies of thepsychometric properties of this tool (1, 3).The Assessment of Clinical Expertise is the equivalent of

long case assessment and is similar to the Clinical Evalua-tion Exercise (CEX), which has low reliability, with a com-bined reliability coefficient of 0.39 in internal medicine (4),but the approach has a great deal of face validity to psy-chiatrists (2). This is observing a patient assessment overa 50-minute period and rating the assessment on a number ofareas.The mini-Assessed Clinical Encounter (ACE) is based on

the mini-Clinical Evaluation Exercise (5). This involves anassessor watching the trainee perform specific parts of theclinical encounter, such as history-taking, assessing mentalstate, gaining consent or giving information to the patient orthe family, and then rating these activities. In assessments ofmedical students in internal medicine, this tool has showna reliability score of 0.77 for eight assessments (6). A studyof construct validity using the mini-Clinical EvaluationExercise, demonstrated that faculty members could reliablydistinguish between three levels of trainee performance onthis instrument (7).The Direct Observation of Procedural Skills (DOPS) was

developed as a tool to assess a trainee’s performance ofpractical procedures, such as venipuncture or intubation (8).Early psychometric data in internal medicine suggests thata reliable estimate of the competence of a trainee can beobtained by at least three different assessors, rating twoprocedures each (9). There are only a few such direct obser-vations in psychiatry.The mini-Peer Assessment Tool (mini-PAT) is a multi-

source feedback tool developed for medical trainees togather assessments from co-workers (10). It is based on theSheffield Peer Review Assessment Tool (SPRAT), whichhas been shown to be completed by assessors in an averageof less than 6 minutes and to give ratings that can discrim-inate between senior and junior trainees (11). The mini-PeerAssessment Tool has been shown to produce a reliablerating of trainees when completed by between 8 and 12assessors (1, 9).Although a number of tools have been developed to en-

able patients to give feedback on the performance of theirdoctor, only two, the Physician Achievement Review (PAR)and the Sheffield Patient Assessment Tool (SHEFFPAT),have been subjected to reasonably rigorous reliability andfeasibility studies (12), but these were not specific enough

for psychiatrists in training. The PSQ (Patient SatisfactionQuestionnaire) was developed as a patient feedback tool togather assessments on psychiatric trainees’ functioning. Itwas designed to allow patients to rate a trainee’s humanisticskills and behaviors, such as politeness, listening skills, an-swering questions, etc. Its use was confined to outpatient andcommunity settings. About 25 patient responses were neededto provide reliable data on doctors’ performance (13, 14).The Case Presentation (CP) is a novel tool to assess

a trainee’s performance in presenting cases at “grandrounds” or other clinical educational meetings (15). The as-sessor rates the trainee’s performance on four dimensions: theclinical assessment of the patient, the trainee’s interpretationof the clinical material, the use of investigations, and thetrainee’s presentation and delivery.The JCP (journal club presentation) is another new tool.

Developed alongside the CP (15), it assesses a trainee’sperformance in presenting at journal clubs, and it coversdomains such as introducing the topic, analysis and critiqueof evidence, presentation skills and method of delivery, andresponding to questions.To have uniform standards for marking for all of the new

andmodified tools, a 6-point score was used, with a score of4 indicating that the performance met the standard requiredfor the end of the current year of training.

SitesThese eight assessment tools were administered singly

and in various combinations to trainee psychiatrists in 16pilot sites, taking in approximately 600 psychiatric traineeswho were all in the first 3 years of specialist training. Thepilot sites included a range of psychiatric training schemes,varying in size from small rotations of 10 specialty traineesto large schemes of up to 100 trainees across England,Scotland, and Wales. These schemes also included urbanand rural settings, as well as teaching and non-teachinghospitals, along with community-based clinical services.The participation of trainees was entirely voluntary, and

the submission of individual assessments into the pilot studywas at the discretion of the trainee concerned.We have beeninformed by the U.K. National Research Ethics Service(NRES) that formal ethical approval was not required forthis study. Data collection took place over a 9-month period.

TrainingA medical educator (JA) and psychiatrist (AB) prepared

and delivered a 3-hour training package at each pilot siteboth to trainers and trainees. The primary aim of the sessionwas to introduce participants to these tools and explain basic

302 http://ap.psychiatryonline.org Academic Psychiatry, 37:5, September-October 2013

WORKPLACE-BASED ASSESSMENTS

techniques and principles. The session also included train-ing in standard setting, following accepted assessor trainingpractices (16).

Trainees were aware of the tools and method for researchbut had the option of not participating if they did not wishto. Patients who participated in the pilot studies were nottrained; however, they were assured that an observer wouldbe checking the trainee, and the participation or refusal toparticipate would not affect their treatment.

Assessment FormsThe assessment forms used for the pilot sites were printed

on multi-part, carbonless paper, producing two copies ofeach assessment. The trainee retained one copy of the as-sessment form, and the top copy was sent away to be read bydocument-recognition software. The software then pro-duced summary reports for each assessment tool used in thepilot.

Acceptability and FeasibilityThe acceptability and feasibility of the tools were

obtained by asking assessors and trainees their satisfac-tion, using a 6-point, Likert-type scale. Assessors werealso asked to record the time taken to complete the as-sessment. This data were collected for all instrumentsexcept the mini-Peer Assessment Tool and Patient Satis-faction Questionnaire.

Data AnalysisAll data were entered into a Structured Query Language

(SQL) database, and statistical analyses were undertakenwith SPSS Version 14.0. Data were anonymized beforeanalysis.

Descriptive Analyses Frequencies, means, and stan-dard deviations (SD), were calculated to describe the ratingsof the participants. These were also calculated for satisfac-tion of participants where collected.

Reliability and Validity Generalizability Theory is ameans of systematically identifying and quantifying errorsof measurement in educational tests. Classical theories ofreliability (interrater, test–retest, and split-half reliability)assume that the universe, which, in this case, is the totality ofassessments about a doctor, is uniform. This assumption isunfounded, and, as Schurwirth and van der Vleuten (17)point out, it leads to assessments that average scores fromdifferent domains, such as the ability to diagnose illness andthe ability to respond to psychosocial cues. Since the

validity of an assessment system is based in part on itsblueprint, that is, the mapping of assessments to curriculumcompetencies, reliability studies that focus on classicaltheory risk sacrificing validity for reliability (17). Studiesbased on Generalizability Theory therefore are now con-sidered to be the gold standard for the psychometric eval-uation of assessment data (18).

Reliability estimates were based on generalizabilitytheory.

In a well-designed and controlled study, it is possible toinclude the effects of various factors, such as the raters,clinical setting, or occasions in the analysis, to determinehow much each contributes to measurement error. In thisnaturalistic study set, a fully-nested design was adopted(assessors nested within participants).

Total scores were analyzed by VARCOMP (MINQUE)and used to estimate variance components for the traineesbeing assessed. Measurement error was estimated using thesamemethod. TheG coefficient is calculated by dividing thevariance attributable to the trainee by the variance attribut-able to the trainee plus the measurement error. The D studywas undertaken in Microsoft Excel 2003, where the re-liability can be calculated, dependent on the number ofassessors contributing to themean score. Themeasurementerror is divided by the number of assessors as they in-crease; theoretically, 95% confidence intervals (CIs) areequal to the SEM (square-root of the variance componentfor measurement error) multiplied by 1.96.This can againbe calculated for a given number of assessors. Adding orsubtracting the SEM from an individual’s total score pro-duces the range within which that individual’s true scorecan be expected to fall 95 out of 100 times that the as-sessment is conducted.

Conventionally high-stakes assessments require the num-ber of assessors to achieve a D value .0.8 for each in-strument; 95% CIs are used as a measure of precision inrelation to the cut score of 4.0.

In order to confirm coverage of the curriculum, in-formation was gathered on the clinical diagnosis in the caseof the patient-focused assessments (that is, the Case-basedDiscussion, the Assessment of Clinical Expertise, themini-Assessed Clinical Encounter, and Case Presentation)and the procedure conducted (in the case of the Direct Ob-servation of Procedural Skills).

IntercorrelationsValidity measures of various assessment tools was car-

ried out in order to inform the future development of theprogram (19).


BRITTLEBANK ET AL.

Results

The number of trainees completing different tools varies,as not all sites or trainees used all the instruments. The de-scriptive statistics for each instrument are shown in Table 1.There are data from a large number of individual episodes ofassessment. Assessors and trainees expressed a high degreeof satisfaction with the tools.As can be seen from Table 1, the patient-focused

assessments—that is, the Case-based Discussion, the As-sessment of Clinical Expertise, the mini-Assessed ClinicalEncounter, and Case Presentation—covered a wide range ofdiagnostic categories. The use of the Direct Observation ofProcedural Skills (DOPS) was limited to few procedures, as40% of the DOPS assessments were of the delivery ofelectroconvulsive therapy; 20% were of venipuncture; and7% were of electrocardiogram recording.A range of team members participated in the mini-Peer

Assessment Tool assessments, with the largest group (35%)performed by nurses, followed by other trainees (25%) andsenior specialist psychiatrists (15%). Social workers, occu-pational therapists, and pharmacists performed only a verysmall number of mini-PAT assessments.

Reliability DataThe reliability data are presented in Table 2, which

illustrates that it is possible to achieve the level of reliabilityrequired of assessments from relatively modest amounts oftesting by use of three of the case-focused tools: Case-basedDiscussion, Assessment of Clinical Expertise, and mini-Assessment of Clinical Encounter. In the case of directobservation, 12 episodes of testing are needed to producea level of acceptable levels of reliability. These results in-dicate that the two presentation tools, (Case Presentationand Journal Club,) do not produce reliable assessments atfeasible volumes of assessment; 18 mini-Peer AssessmentTools and 15 Patient Satisfaction Questionnaire assess-ments each produce a reliability of 0.80.A total of 149 trainees were assessed by at least one of

these two assessments. There was a significant correlationbetween the scores (r=0.52; p ,0.001 [N=149]). Combin-ing the scores of the two assessment tools produced a morereliable assessment. Six combined case and journal clubpresentation assessments produced a reliability of 0.80.

IntercorrelationsTable 3 shows the intercorrelations between each of the

seven instruments studied. The highest correlations wereseen between the three patient-based assessments and twoT

ABLE

1.Descriptive

Statistics

fortheEight

Instrumen

ts

Instrumen

tTrainee

sNumber

Completed

Mea

nNumber

per

Trainee

(ran

ge)

Mea

nAgg

rega

teSc

ore

(SD)

Assesso

rSa

tisfac

tion

(SD)

Trainee

Satisfac

tion

(SD)

Tim

eto

Complete,

min.(SD

)

Diagn

osticCateg

ories

Invo

lved

(%)

Delirium,

Dem

entia,

andOther

Cogn

itive

Disorders

Substan

ce-

Related

Disorders

Schizo

phren

iaan

dOther

Psych

otic

Disorders

Mood

Disorders

Anxiety

Disorders

Perso

nality

Disorders

CbD

220

574

2(1–11)

4.7(0.64)

4.54

(0.84)

4.65

(0.79)

25(16)

139

1832

107

ACE

141

246

2(1–6)

4.76

(0.70)

4.5(0.94)

4.67

(0.9)

37(28)

158

2331

65

Mini-A

CE

190

357

2(1–9)

4.71

(0.66)

4.38

(0.96)

4.5(0.9)

24(19)

156

2334

65

DOPS

133

299

2(1–6)

4.9(0.63)

4.67

(0.94)

4.76

(0.81)

20(19)

––

––

––

Mini-P

AT

114

690

64.85

(0.53)

––

––

––

––

–

PSQ

195

917

5(3–6)

5.32

(0.50)

––

––

––

––

–

CP

147

208

2(1–5)

4.78

(0.62)

4.90

(0.80)

4.91

(0.79)

25(23)

145

3329

25

JCP

121

257

2(1–4)

4.69

(0.60)

4.74

(0.73)

4.80

(0.63)

25(23)

––

––

––

SD:stand

arddeviation;CbD

:Case-ba

sedDiscussion;ACE:Assessm

ento

fClin

icalExpertise;Mini-A

CE,Mini-A

ssessedClin

icalEn

coun

ter;DOPS

:Dire

ctly-O

bservedProceduralSkills;Mini-P

AT:

Mini-P

eerAssessm

entTo

ol;P

SQ:P

atient

SatisfactionQuestionn

aire;C

P:CasePresentatio

n;JCP:

Journa

lClubPresentatio

n.



presentation assessments. There are also high correlationsbetween the case-based discussion, peer assessment, andpresentational assessments. There are weaker correlationsbetween the directly observed procedures, case-based dis-cussion, and clinical encounters. Patient Satisfaction did notcorrelate with any of the other instruments.

Discussion

Although there have been numerous studies that havelooked at the properties of individual workplace-based as-sessment tools, there have only been two other studies thathave looked at whole programs of workplace assessment (1,9). This is the first large-scale quantitative study of a pro-gram of workplace-based assessment in psychiatry, there-fore with major implications for future developments andimplementation of the psychiatry assessment system.

A potential limitation of the study is self-selection ofsites. Furthermore, a note of caution is necessary concerningthe accuracy of self-reporting of the time taken to completeassessments. The third potential limitation is participantbias, as those who agreed are more likely to be highly mo-tivated (20).

The results of this study provide some evidence for thereliability and feasibility of the instruments. The cohorts,sites, and size of training schemes participating in thestudy and evaluation of the Case-based Discussion, mini-Assessed Clinical Encounter, Patient Satisfaction Ques-tionnaire, and mini-Peer Assessment Tool were particularlylarge, allowing some generalizable conclusions to be drawn.It is clear that the Case Presentation and Journal Club pre-sentation both measure presenting skills, and thus a highcorrelation between scores on the tools provides evidence ofthe validity of both methods. All the instruments that in-cluded a measure of user satisfaction were rated highly bothby the trainees and assessors. High correlations between

various instruments and the use of the instruments assessingtrainees working with patients who have a variety of psy-chiatric diagnoses indicate suitability of these tools, but itwas not possible in this study to address issues of predictiveor construct validity.

The Patient Satisfaction Questionnaire proved problem-atic, as it is an exception to the above observations aboutvalidity and does not correlate with any of the other instru-ments. It may be that patients have a unique perspectiveand expectations of their psychiatrists and their perform-ances, and therefore contribute valuable insights to the as-sessment program. As Schuwirth and van der Vleuten argue(17), it is desirable that assessment systems reflect the

TABLE 2. Reliability Data

Number ofAssessors D statistic

95% ConfidenceInterval

CbD 4 0.8 0.5ACE 5 0.8 0.5Mini-ACE 8 0.8 0.4DOPS 12 0.8 0.4Mini-PAT 6 0.5 0.5

18 0.8 0.3PSQ 5 0.5 0.5

15 0.8 0.3CP 6 0.1 0.5

15 0.3 0.3JCP 6 0.4 0.5

19 0.7 0.3CP and JCPcombined

6 0.8 0.5

Number of assessors refers to the number of individual episodes ofassessment performed by different trainers to achieve the given reli-ability (D statistic) and precision of rating (95% confidence interval).CbD:Case-basedDiscussion;ACE:AssessmentofClinical Expertise;

Mini-ACE: Mini-Assessed Clinical Encounter; DOPS: Directly-Ob-served Procedural Skills; Mini-PAT: Mini-Peer Assessment Tool; PSQ:Patient Satisfaction Questionnaire; CP: Case Presentation; JCP:Journal Club Presentation.

TABLE 3. Intercorrelations Among the Eight Instruments

ACE Mini-ACE DOPS Mini-PAT PSQ CP JCP

CbD 0.64*** 0.61*** 0.20* 0.38*** 0.06 0.52*** 0.38***ACE 0.51*** 0.37** 0.19 –0.02 0.50*** 0.30**Mini-ACE 0.17 0.34** –0.01 0.39*** 0.38***DOPS 0.37** –0.10 0.07 0.17Mini-PAT 0.13 0.40** 0.31**PSQ 0.06 0.06CP 0.48***

CbD: Case-based Discussion; ACE: Assessment of Clinical Expertise; Mini-ACE: Mini-Assessed Clinical Encounter; DOPS: Directly Observed Pro-cedural Skills; Mini-PAT:Mini-Peer Assessment Tool; PSQ: Patient Satisfaction Questionnaire; CP: Case Presentation; JCP: Journal Club Presentation.*p ,0.05; **p ,0.01; ***p ,0.001.


BRITTLEBANK ET AL.

diversity of assessment viewpoints. It will be interesting tosee whether patient satisfaction tools work differently indifferent settings, such as the private, rather than publicsector.There is strong evidence for the feasibility of many

of these tools. The Case-based Discussion producesa highly reliable assessment after a total of 100 minutes ofassessment that have been gathered over four episodes,each with different assessors. The Assessment of ClinicalExpertise after 185 minutes with five assessors and themini-Assessment of Clinical Encounter after 192 minuteswith eight assessors show high reliability. The findingsfor the Case-based Discussion and the Clinical Encoun-ters are comparable to the findings using similar instru-ments in studies of other groups of medical trainees in theU.K. (1, 9).The peer assessment and patient satisfaction mea-

sures produce lower levels of reliability. However, the95% CIs are such that sufficiently precise scores may beproduced for most trainees with six assessments on themini-Peer Assessment and five on the Patient SatisfactionQuestionnaires.The Direct Observation of Procedural Skills was found to

have a much lower reliability than found in other studies (1,9). It is likely that this reflects the fewer number of differentprocedures carried out by psychiatrists and it is necessary torefine the tool before it can be used to reliably assess com-petence in delivering, for example, ECT.In an earlier study, Webb et al. (21) reported that the

residency-in-training examination (PRITE) was moderatelycorrelated with Part One of the Board Examination. Similarcorrelations need to be followed further with the RoyalCollege examinations. Juul et al. (22) found, among a cohortof graduates of psychiatry residency programs, that recentgraduates who attempted American Board examinationswere more likely to become board-certified. Thus, a corre-lation between training and external assessment is an im-portant factor to explore further, using both quantitative andqualitative data.A key advantage of our study is that it evaluates a whole

assessment program to judge themost efficient and effectivecombinations of assessment tools. Given the high correla-tion of the Case Presentation and Journal Club Presentationtools, it may be concluded that they assess the same con-struct, namely, “presentation skills.” It may therefore bereasonable to combine the results of these assessments andthereby compensate for each instrument’s low reliability,which may produce an acceptable degree of reliability aftersix episodes of assessment.

Further work is indicated to assess which tools have du-plicate components and which can be dropped to save time,effort, and resources.

Conclusions

These findings from an initial pilot study provide earlyevidence for the feasibility of the psychiatry assessment fortrainees, but modification to maximize feasibility whileassuring reliability is needed. Some of the individual as-sessment instruments can be combined, but further evalua-tion is necessary to produce an assessment that incorporatesthe patient’s viewpoint.

We gratefully acknowledge the assistance of SamAbbott, DianaMuramaa, and Angela McMahon (Manchester Mental Health &Social Care Trust) for the design and distribution of the forms, andfor all the data input.

Declaration of interests: All authors have completed theUnified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the correspondingauthor) and declare that 1) AB and JA have support from theRoyal College of Psychiatrists for the submitted work (2);AB, JA, DL, AM and DB have no relationships with anycompany that might have an interest in the submitted workin the previous 3 years; and 3) AB, JA, DL, AM, and DBhave no non-financial interests that may be relevant to thesubmitted work.

Ethical approval: We have been informed by the U.K.National Research Ethics Service (NRES) that formal eth-ical approval was not required for this study.

Funding: This study was funded by the Royal College ofPsychiatrists.

References

1. Davies H, Archer J, Southgate L, et al: Initial evaluation of thefirst year of the Foundation Assessment Programme. MedEduc 2009; 43:74–81

2. Brittlebank AD: Piloting workplace-based assessment in psy-chiatry, inWorkplace-BasedAssessments in Psychiatry. EditedbyBhugraD,MalikA, BrownN. London, U.K., Gaskell, 2007,pp 96–108

3. Fitch C: Assessing psychiatric competencies: what does theliterature tell us about methods of workplace-based assess-ment? Adv Psychiatr Treat 2008; 14:122

4. Norcini JJ: The death of the long case? BMJ 2002; 324:408–4095. Norcini JJ, Blank LL, Arnold GK, et al: The mini-CEX

(Clinical Evaluation Exercise): a preliminary investigation.Ann Intern Med 1995; 123:795–799



6. Kogan JR, Bellini LM, Shea JA: Feasibility, reliability, andvalidity of the mini-Clinical Evaluation Exercise (mCEX) ina medicine core clerkship. Acad Med 2003; 78(Suppl):S33–S35

7. Holmboe ES, Huot S, Chung J, et al: Construct validity of themini-Clinical Evaluation Exercise (miniCEX). Acad Med2003; 78:826–830

8. Wilkinson J, Benjamin A, Wade W: Assessing the perfor-mance of doctors in training. BMJ 2003; 327:s91–s92

9. Wilkinson JR, Crossley JG, Wragg A, et al: Implementingworkplace-based assessment across the medical specialties inthe United Kingdom. Med Educ 2008; 42:364–373

10. Archer JC, Norcini J, Southgate L, et al: mini-PAT (PeerAssessment Tool): a valid component of a national assessmentprogramme in the UK? Adv Health Sci Educ Theory Pract2008; 13:181–192

11. Archer JC, Norcini J, Davies HA: Use of SPRAT for peer re-view of paediatricians in training. BMJ 2005; 330:1251–1253

12. Chisolm A, Askham J: What Do You Think of Your Doctor?:a review of questionnaires for gathering patients’ feedback ontheir doctor. Oxford, U.K., Picker Institute, 2006

13. Violato C, Lockyer J, Fidler H:Multi-source feedback: amethodof assessing surgical practice. BMJ 2003; 326:546–548

14. Crossley J, Eiser C, Davies HA: Children and their parentsassessing the doctor–patient interaction: a rating system fordoctors’ communication skills. Med Educ 2005; 39:820–828

15. Searle G: Evidence-based medicine: case presentation andjournal club assessments, inWorkplace-Based Assessments inPsychiatry. Edited by Bhugra D, Malik A, Brown N. London,U.K., Gaskell, 2007, pp 76–82

16. Holmboe ES, Hawkins RE, Huot SJ: Effects of training indirect observation of medical residents’ clinical competence:a randomized trial. Ann Intern Med 2004; 140:874–881

17. Schuwirth LW, van der Vleuten CPM: A plea for new psy-chometric models in educational assessment. Med Educ 2006;40:296–300

18. Crossley J, Russell J, Jolly B, et al: “I’m pickin’ up goodregressions:” the governance of generalisability analyses.MedEduc 2007; 41:926–934

19. van der Vleuten CPM, Schuwirth LWT: Assessing profes-sional competence: from methods to programmes. Med Educ2005; 39:309–317

20. Lydall GJ, Malik A, Bhugra D: MTAS: Mental health ofapplicants seems to be deteriorating. BMJ 2007; 334:1335

21. Webb LC, Juul D, Reynolds CF III, et al: How well does thepsychiatry residency in-training examination predict per-formance on the American Board of Psychiatry and Neu-rology. Part I Examination? Am J Psychiatry 1996; 153:831–832

22. Juul D, Scully JH Jr, Scheiber SC: Achieving board certifi-cation in psychiatry: a cohort study. Am J Psychiatry 2003;160:563–565


BRITTLEBANK ET AL.

workplace-based assessments in psychiatry: evaluation of a whole assessment system

Documents