tools and tips for learner assessment and evaluation in the emergency department

Tools and Tips for Learner Assessment and Evaluation in the Emergency DepartmentHeather PattersonPGY-4April 28 2010

What should we be assessing?

What is the best method of assessment?

What factors influence assessment?

What are the tools available to evaluate learners?

Tips for delivering feedback.

Objectives

What should we be assessing?Brief review of CanMEDS




Tips for delivering feedback.

Objectives

What should we assess?


What is the best method of assessment in the ED?Direct Observation



Tips for delivering feedback. Objectives

Why bother?Sherbino et al 2008Hobgood et al 2008Cydulka 1996

What counts?

Direct Observation

ChallengesHawthorne effectED flow and pt careTeaching responsibilities

Direct Observation

Formalized direct observation programPittsburg EM residency program Dorfsman et al 2009

How did they evaluate resident performance? Standardized direct observation tool (SDOT)Shayne et al 2002 and 2006, La Manita et al 2002 Reliable?? Valid??

Direct Observation

Take home:Best method for the assessment of true behaviourIt may be worthwhile to do some behind the curtain assessments to minimize the Hawthorne effectCan be used to guide feedback and to give more representative evaluationsOpportunity exists for development of reliable and valid checklist tools to assess resident performance in the ED Direct Observation



What factors influence assessment?Pitfalls of learner assessment


Tips for delivering feedback. Objectives

Evaluation:Formal assessment of how the learner has performed.

Evaluation vs Feedback

Feedback:Designed to make a learner aware and accepting of strengths and weaknesses and to help guide future learning

Evaluation vs Feedback

Hawk vs. DoveKnow your tendencies for how you evaluateAcknowledge your subjective expectations for a particular domain of assessmentCydulka et al 1996

Pitfalls of assessmentA practical guide for medical teachers. Dent 2005

Halo vs millstone effectWell documented and accepted as a source of bias in learner evaluation

Pitfalls of assessmentA practical guide for medical teachers. Dent 2005

Leniency biasBandiera et al 2008 Pitfalls of assessment

Leniency bias and range restrictionJouriles et al 2002No evaluation of lowest score despite previously identified problems

Pitfalls of assessment

Possible reasons for leniency bias and range restrictionDudek et al 2005Lack of documentation of specific eventsLack of knowledge about what to documentAnticipation of an appeal processLack of remediation options

Jouriles et al 2002Avoidance of negative interactionsFear of negative teaching evaluationWorry about time commitments to justify evaluation Worry about time requirements and potential responsibility for remediation

Gray et al 1996Weakness inherent to ITER as an evaluation toolLack of training on proper use of ITER or other assessment tools used


Take home points:Be aware of your pre-existing perceptions about the learner

Be aware of your biases

Dont be afraid to give a representative evaluation





What are the tools available to evaluate learners?ITEREncounter Cards360 degree feedbackChecklists

Tips for Delivering Feedback Objectives

ITER/Global Rating Forms

Pros:Ease of administrationAllows for longitudinal assessmentsSherbino et al 2008

Cons:Bias introduced into evaluation RecallHalo/millstoneLeniency and range restrictionSherbino et al 2008Practical guide for medical teachers Dent 2005Gray et al 1996

ITER/Global Rating Forms

Cons (cont):Poor reliabilityPoor discrimination between constructs or behavioursDonnon et al - not yet publishedSilber et al 2004

Take home:Residents: Deliver ITERs earlier to minimize recall bias. Tell staff you are sending them.Staff:Be objective as possible and include written comments. Be aware of bias ITER/Global Rating Forms

Daily Encounter Cards

ProsLess recall biasCan be structured to facilitate CanMEDS roles evaluationBandiera et al 2008

ConsLeniency biasRecall biasNeeds further reliability and validity assessment Kim et al 2005Paukert et al 2002Brennan et al 1997

Daily Encounter Cards

Pros? More representative assessment of teamwork, leadership, communication, collaboration and professionalism Sherbino et al 2008?Stimulus for positive changeLockyer 2003

ConsNo true MSF post-graduate medical education research Rodgers et al 2002Numbers required achieve reliabilityWood et al 2006 Multisource Feedback (MSF)

Take home:Input from allied health professionals, collegues, and patients may contribute to a more complete assessment of resident competencies if done appropriately

Caution: introduction of bias, ?reliability if only a few comments

Multisource Feedback (MSF)

Checklists

ProsNo recall bias, +/- reduced leniency biasOver 55 published tools for use during direct observation of clinical behaviourKogan et al 2009

ConsEvaluates specific behaviours NOT global performanceACGME toolbox of assessment methods 2000Extensive process to develop a reliable, valid toolCooper et al 2010Requires direct observation without interferenceDorfsman et al 2009Shayne et al 2006 Checklists

Take home points:Good for specific behavioural assessment ie leadershipExtensive process to develop a toolSignificant research potential in this area

Checklists





Tips for Delivering Feedback Objectives

Brief

Formal

Major

Types of Feedback

Timing and location

Feedback on your performance

Learner self assessment

Tips for Effective Feedback

Feedback content Tips for Effective Feedback

Direct observation represents the highest fidelity measurement of true behaviour

Feedback and evaluation are different processes and have different goals

Be aware of your biases and limitations of the evaluation toolsHawk vs DoveHalo vs Millstone effectRecall biasLeniency and range restriction

Feedback should be specific and identify modifiable behaviours

Take Home Messages

(1) Dorfsman ML, Wolfson AB. Direct observation of residents in the emergency department: a structured educational program. Acad.Emerg.Med. 2009 Apr;16(4):343-351.(2) Sherbino J, Bandiera G, Frank JR. Assessing competence in emergency medicine trainees: an overview of effective methodologies. CJEM, Can.j.emerg.med.care. 2008 Jul;10(4):365-371.(3) Hobgood CD, Riviello RJ, Jouriles N, Hamilton G. Assessment of Communication and Interpersonal Skills Competencies. Acad.Emerg.Med. 2002;9(11):1257-1269.(4) Jouriles NJ, Emerman CL, Cydulka RK. Direct observation for assessing emergency medicine core competencies: interpersonal skills. Acad.Emerg.Med. 2002 Nov;9(11):1338-1341.(5) Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: a systematic review. JAMA 2009 Sep 23;302(12):1316-1326.(6) Andersen PO, Jensen MK, Lippert A, Ostergaard D, Klausen TW. Development of a formative assessment tool for measurement of performance in multi-professional resuscitation teams. Resuscitation 2010 Mar 24.(7) Kim J, Neilipovitz D, Cardinal P, Chiu M. A comparison of global rating scale and checklist scores in the validation of an evaluation tool to assess performance in the resuscitation of critically ill patients during simulated emergencies (abbreviated as "CRM simulator study IB"). Simul.Healthc. 2009 Spring;4(1):6-16.(8) Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method for assessing clinical skills. Ann.Intern.Med. 2003 Mar 18;138(6):476-481.(9) Cooper S, Cant R, Porter J, Sellick K, Somers G, Kinsman L, et al. Rating medical emergency teamwork performance: development of the Team Emergency Assessment Measure (TEAM). Resuscitation 2010 Apr;81(4):446-452.(10) Fink A, Kosecoff J, Chassin M, Brook RH. Consensus methods: characteristics and guidelines for use. Am.J.Public Health 1984 Sep;74(9):979-983.(11) Morgan PJ, Lam-McCulloch J, Herold-McIlroy J, Tarshis J. Simulation performance checklist generation using the Delphi technique. Can.J.Anaesth. 2007 Dec;54(12):992-997.(12) Lockyer J, Singhal N, Fidler H, Weiner G, Aziz K, Curran V. The development and testing of a performance checklist to assess neonatal resuscitation megacode skill. Pediatrics 2006 Dec;118(6):e1739-44.(13) Ringsted C, Ostergaard D, Ravn L, Pedersen JA, Berlac PA, van der Vleuten CP. A feasibility study comparing checklists and global rating forms to assess resident performance in clinical skills. Med.Teach. 2003 Nov;25(6):654-658.(14) Friedman Z, Katznelson R, Devito I, Siddiqui M, Chan V. Objective assessment of manual skills and proficiency in performing epidural anesthesia--video-assisted validation. Reg.Anesth.Pain Med. 2006 Jul-Aug;31(4):304-310.(15) Morgan PJ, Cleave-Hogg D, Guest CB. A comparison of global ratings and checklist scores from an undergraduate assessment using an anesthesia simulator. Acad.Med. 2001 Oct;76(10):1053-1055.(16) Morgan PJ, Cleave-Hogg D, DeSousa S, Tarshis J. High-fidelity patient simulation: validation of performance checklists. Br.J.Anaesth. 2004 Mar;92(3):388-392.(17) Wright MC, Phillips-Bute BG, Petrusa ER, Griffin KL, Hobbs GW, Taekman JM. Assessing teamwork in medical education and practice: relating behavioural teamwork ratings and clinical performance. Med.Teach. 2009 Jan;31(1):30-38.(18) Jefferies A, Simmons B, Tabak D, McIlroy JH, Lee KS, Roukema H, et al. Using an objective structured clinical examination (OSCE) to assess multiple physician competencies in postgraduate training. Med.Teach. 2007 Mar;29(2-3):183-191.(19) Cydulka RK, Emerman CL, Jouriles NJ. Evaluation of Resident Performance and Intensive Bedside Teaching during Direct Observation. Acad.Emerg.Med. 1996;3(4):345-351.(20) Shayne P, Heilpern K, Ander D, Palmer-Smith V, Emory University Department of Emergency Medicine Education,Committee. Protected clinical teaching time and a bedside clinical evaluation instrument in an emergency medicine training program. Acad.Emerg.Med. 2002 Nov;9(11):1342-1349.(21) Shayne P, Gallahue F, Rinnert S, Anderson CL, Hern G, Katz E, et al. Reliability of a core competency checklist assessment in the emergency department: the Standardized Direct Observation Assessment Tool. Acad.Emerg.Med. 2006 Jul;13(7):727-732.(22) LaMantia J, Panacek EA. Core Competencies Conference: Executive Summary. Acad.Emerg.Med. 2002;9(11):1213-1215.(23) Bandiera G, Lendrum D. Daily encounter cards facilitate competency-based feedback while leniency bias persists. CJEM Canadian Journal of Emergency Medical Care 2008 Jan;10(1):44-50.(24) Paukert JL, Richards ML, Olney C. An encounter card system for increasing feedback to students. Am.J.Surg. 2002 Mar;183(3):300-304.(25) Kim S, Kogan JR, Bellini LM, Shea JA. A randomized-controlled study of encounter cards to improve oral case presentation skills of medical students. Journal of General Internal Medicine 2005 Aug;20(8):743-747.(26) Brennan BG, Norman GR. Use of encounter cards for evaluation of residents in obstetrics. Academic Medicine 1997 Oct;72(10 Suppl 1):S43-4.(27) Dudek NL, Marks MB, Regehr G. Failure to fail: the perspectives of clinical supervisors. Acad.Med. 2005 Oct;80(10 Suppl):S84-7.(28) Frank JR, Danoff D. The CanMEDS initiative: implementing an outcomes-based framework of physician competencies. Med.Teach. 2007 Sep;29(7):642-647.(29) Zibrowski EM, Singh SI, Goldszmidt MA, Watling CJ, Kenyon CF, Schulz V, et al. The sum of the parts detracts from the intended whole: competencies and in-training assessments. Med.Educ. 2009 Aug;43(8):741-748.(30) Epstein RM. Assessment in medical education. N.Engl.J.Med. 2007 Jan 25;356(4):387-396.(31) Gray JD. Global rating scales in residency education. Acad.Med. 1996 Jan;71(1 Suppl):S55-63.

References

Quinn!

Take things in contextIndividual evals are NOT summativeI believe the goal should be to provide representative evaluations for performance that you observe so when the time comes for a summative assessment, we can gain an accurate overall impression of learner performance.I recognize that there are significant challenges - time limitations, 1-2 shiftsGoals: not to provide a miracle solution but instead to offer some things to think about as you are working with learners in the dept. Direct observation - theoretical basis and practical implications Direct observation - theoretical basis and practical implications The canmeds framework is a logical and practical framework from which you can decide what characteristics you would like to assess in your learner. logical - FR implementation in educational objectives and assessment strategies. Medical schools are moving towards a competency based structurePractical - The ED provides a different environment than many other specialties. We use many skills from most of the CANMEDS roles on a regular basis. Therefore, it is reasonable to assume that we can assess our learner performance using this framework.

In addition to the role of medical expert, pick 1 or 2 other roles and focus on performance in those domains.

History:the development of this outcome based competency framework for physician assessment and education started in the early 90sIt was driven by several factors incluyding trends of society expectations for medical personell, access to medical info in the internet, govementnetencroucahment on medical regualtion, as well as litigation imperiativesThe initial framework was adopted by the FRCPC in 1996 and updated in 2005The expectation is that all specialties represented by the FRCPC will establish learning objectives and evaluation tools and methods that adequately and reliably assess the competencies. What are the roles?COMMUNICATORCOLLABORATORPROFESSIONALMANAGERHEALTH ADVOCATESCHOLARMED EXPERT

Remember, we are not talking about a complete assessment of canmeds roles - this is a huge topic and requires multiple methods of assessment - we are talking about evaluation in the emergency department. What is Direct Observation.Medical education literature defines this as observation of learner performance in any real or simulated clinical scenarioThis includes: patient encounters, mini clinical exams (with real patients and eval by checklists), simulation, OSCE, standardized patient exams

WHY BOTHER?In medical education, we talk about assessing learners knowledge, skills, and attitudes. In the ED we have the opportunity to assess all of these skills. Observational studies have shown that many faculty members base evaluation of resident performance on behaviours such as interest in learning, accuracy of patient care, aggressiveness of patient care, and verbal skills in presenting cases {{301 Cydulka,Rita K. 1996; }}. Direct observation is considered the highest fidelity sample of performance for assessmentThere are still problems with the method but it is accepted that direct observation will give the closest approximation to true unobserved behavioiur

What counts as direct observation:- being in the room and observing part of all of interaction (can be sick or well patients)- supervising resident procedures and consents- listening behind the curtain- case presentations-observing interactions with alied health professionals- observing interactions with consultants

Challenges:Busy departmentDelivering patient careMaintainig ED flowCollaborating/coordinating with consultants and allied health professionalsTeaching responsibilitiesManageing learnersHawthorneWhile observation of EM residents occurs in the department in many different settings, it is often sporadicAn american group in Pitsburg developed and published their experiences with a formalized direct observation program. In this program, observers with no clinical responsibility spent 4-5 hours with second year residents (32 residents). The observation included watching multiple clinical encounters, patient presentations, management of the emergency department, management of IT and clinical resources, and interaction with the allied health care team. After observing severeal patient encounters, the observer would provide some specific immediate feedback. At the end of the observation period, feedback was given. The evaluators used the SDOT (standardized direct observation tool) and found the form easy to complete while observing the learners and easy to compile the data. Residents felt that the experience helped provide insight into specific strengths and weakness and did not feel threatened or intimiated by the experience. Faculty stated that they were able to identify concrete examples of individual difficulties in different domains of the core competencies that were previously impossible or challenging to identify. Faculty who were responsible for the clinical supervision of the observed resident did not feel intimidated and noted that the program did not interfer with regular clinical duties and supervision. at the end of the program trial period, residents were asking for additional observation time to aid in their improvement of their clinical skills and ongoing refinement of ACGME core competencies.

SDOT:now in its ninth iteration, was developed by the Counsel of Emergency Medicine Residency Directors in United States to assess the ACGME core competencies in emergency medicine residents. Twenty six clinical behaviours are mapped to the core competencies on an educational blueprint linked with objectives. The tool also has seven pages of references to provide ample information for assessors to evaluate learners. How easy is it to use? Assessors using the tool for the formalized program in Pittsburg found that it was quite easy to simultaneously observe the residents and use the form effectively. Seems a bit bulky on first glance but ample info to aid ease of assessment

Reliability: assessed with video performances by 82 assessors at 16 different sites. correlation coefficients (measure of the extent to which the raters agree ie the variation between the raters is small compared to the variation between the two subjects assessed)were calculated for scores in each of the ACGME core competency domains. An overall alpha of 0.95, ICC 0.81 (95% CI 0.45 to 1.0). Interestingly, the lowest alpha and ICC was for the medical knowledge competency. Multivariate analysis revealted no differences in scoring with experience, academic title, location of practice or previous SDOT use. Second study used the tool in the real department. Found that reliability was significantly lower. Questions the utility of the tool.

Content validity was established by using a panel of experts to develop the tool. There is however, no evidence of criterion,. Evaluation: this is specifically to tell the learner how they have performed. We use many different assessment tools to help us quantify performance for an evaluation ie ITER, DEC, MSF. Typically this will appear in the residents file or recordsFeedback: less formalized ie dont use tools but depends on the ability of the assessor to watch the learner. The purpose is to guide the learner in the process of improving their clinical performance.

An example to different the two forms of learner assessment is: I have given you a score of 2/5 for professionalism because of your poor interactions with the nursing staff. VS. I noticed that your interactions with the nurses seemed strained at times. Lets talk about this and see whether we can identify the source of the problem. After that, we can set some goals for improving this

Feedback: less formalized ie dont use tools but depends on the ability of the assessor to watch the learner. The purpose is to guide the learner in the process of improving their clinical performance.

An example to different the two forms of learner assessment is: I have given you a score of 2/5 for professionalism because of your poor interactions with the nursing staff. VS. I noticed that your interactions with the nurses seemed strained at times. Lets talk about this and see whether we can identify the source of the problem. After that, we can set some goals for improving this2-3 characteristics of the learner will influence the entire evaluation either positively or negativelydo not need to be clinical characteristics - can be personality, initial interaction with staff, presence of features that staff relates to ie hockey player or musician etc

Observational studies have shown that many faculty members base evaluation of resident performance on behaviours such as interest in learning, accuracy of patient care, aggressiveness of patient care, and verbal skills in presenting cases {{301 Cydulka,Rita K. 1996; }}. Therefore you must recognize your personal tendencies and expectations for each area of CANMEDS.

2-3 characteristics of the learner will influence the entire evaluation either positively or negativelydo not need to be clinical characteristics - can be personality, initial interaction with staff, presence of features that staff relates to ie hockey player or musician etc . Can also be based on comments from other staff

DEC used to evaluate resident performance - 54 learners 801 evaluations, 43 staff evaluatorsBinary evaluation: area of strength vs needs attentionOnly 1.3% of total evaluations used needs attentionOf 43 staff, 33 never used the needs attention option

- may result in a limitation of assessment of true performance -Jouriles et al:- secondary data analysis of direct observation assessment forms for a single American residency program over 7 years. Number of residents assessed not given. 17 faculty assessors. scores for interpersonal skills related components. They demonstrated excellent internal consistency and reliability with a Cronbachs alpha of 0.98. greater proportion of residents received above average assessments than what is observed for emergency medicine residents nationally. residents that were previously noted to have deficiencies with interpersonal and communication skills were ranked higher than expected. ? Hawthorne effect, that is, residents perform at a level higher than expected when clinical behaviours are observed.

While clinical supervisors are able to recognize a learner with unsatisfactory performance, most demonstrate a reluctance to fail learners even though performance is judged to be unsatisfactory {{303 Dudek,N.L. 2005:}}. Themes: documentation, personal and professional concerns, tool use

Dudek: semistructured interviews in Ottawa with facultyJourilesGray 1996 - review about the utility, pros/cons of ITERs and global rating scales

While the ITER is considered the gold standard of resident evaluation - >92% of all programs in canada (in one study with 67% response rate of all specialty programs. No info about who didnt respond) use the ITER to evaluate all the Canmeds rolesTypically a descriptive anchor with a 5-7 point Likart scale. Often used at the end of the rotation rather than as a daily assessmentBias - recall bias is predominant here.Reliability does not appear to be good. Although there are many studies, depends on assessor and is easily influenced by the assessors perception of resident characteristics ie fun, boring, interested, not interested etcTyrones studyLooked at iters for IM, surgery, pediatrics: IM - 2 factors, Sx: 3 factors, pediatrics: 5 or 6 but dont know what each of them is. Demonstrates that the iter isnt necessarily measuring what we think it is. Factor analysis is a method that we use to determine if a tool is assessing more than one group of behaviours. In this case, we are looking at CANMEDS roles and this tells us that we arent able to measure all the domains independently. Two things to take from this - we need a better tool (new ITERs are being used now but havent been assessed yet) and we need to add comments to the evals to give a better assessment of the roles for resident improvement. Other things - we also may not be assessing people as aggressively as we need to be - if you are just b=outting 3s or 4s all the way down without carefully looking at the characterisitcs, then you need to change!!

- Silber et al: ITER developed with 23 behavioural descriptors that coresponded to the ACGME competencies. 1295 ratings. Factor analysis showed 2 main constructs: interpersonal, communication, professional. 2= knowledge, patient care, pracitce improvement and systems based care. They forced a 6 factor solution but the loading with the other 4 factors was poor (eigen values

tools and tips for learner assessment and evaluation in the emergency department

Documents

assessment tools

pitfalls of learner

best method of assessment

formal assessment

remediationgray et

hobgood et

sherbino et

manita et