a teaching encounter card to evaluate clinical supervisors across clerkship rotations
TRANSCRIPT
2010; 32: e96–e100
WEB PAPER
A teaching encounter card to evaluate clinicalsupervisors across clerkship rotations
ERIN KEELY, LAWRENCE OPPENHEIMER, TIMOTHY WOODS & MERIDITH MARKS
University of Ottawa, Canada
Abstract
Background: Evaluation of faculty teaching is critical to improving the educational experience for both students and faculty.
Aim: Our objectives were to implement an evaluation system, using the teaching encounter card, across multiple rotations in the
clerkship and determine the feasibility, reliability and validity of this evaluation tool in this expanded setting.
Methods: Students were asked to rate clinical supervisors on nine teaching behaviours using a 6-point rating scale and asked
whether they would like to nominate the teacher for a clinical teaching award.
Results: A total of 3971 cards for 587 clinical supervisors across seven clerkship rotations were analyzed. There was an average
of 7.3 cards per supervisor (median¼ 5, range 2–66). There was high internal consistency between items on the card
(Cronbach’s alpha 0.965). The reliability was fair at 0.63. Seventeen cards per supervisor would be required to achieve a reliability
40.8 (G study). Ratings were higher for encounters that occurred in the operating room and within the anaesthesia rotation.
The teachers who had a positive recommendation for teaching award nomination received higher scores than their colleagues.
Conclusion: We successfully implemented a faculty evaluation card across clerkship rotations that was flexible enough to use
in multiple learning environments and allowed the identification of outstanding clinical teachers.
Introduction
Quality clinical teaching by enthusiastic and committed faculty
is of utmost importance in a medical programme. Evaluation
of faculty teaching is critical to improving the educational
experience for students and faculty. Evaluation of teaching
effectiveness facilitates recognition for excellence in teaching,
application for academic promotion, allocation of teaching
responsibilities and identification of common weakness to
focus on through faculty development programmes (Williams
et al. 2002). At some universities, results may even translate to
financial rewards (Williams et al. 2002). Feedback is highly
valued by faculty and is identified by community-based faculty
as the most important recognition of their commitment and
service (Dent et al. 2004). Despite the importance of faculty
evaluation, it is often difficult to collect and compare across
teaching services.
There are many challenges in providing timely, effective
faculty evaluation. It is important to balance receiving enough
information about specific teaching behaviours to facilitate
change without being too lengthy and impacting completion
rates. Encounter cards are inexpensive, portable tools that
offer the advantage of timely completion following a defined
time/event that an evaluation is based on. They provide
opportunity to have multiple evaluations within a rotation.
Although used in trainee evaluation, they have not been well
studied for faculty evaluation (Brennan & Norman 1997;
Kernan et al. 2004; Richards et al. 2007). Faculty evaluation has
been investigated by others using various methods and tools
that vary in length from 54 items to a single global rating scale
(Irby et al. 1987; Ramsey & Gillmore 1988; Ramsbottom-Lucier
et al. 1994; Litzelman et al. 1998a; Copeland & Norman 2000;
Steiner et al. 2000; Williams et al. 2002; Kernan et al. 2004;
Smith et al. 2004; Zuberi et al. 2007). Timely completion is
essential for accuracy, especially for short exposures to faculty.
Typically evaluations are completed at the end of clinical
rotations, but timing may vary from immediately after a specific
patient encounter (Kernan et al. 2004), to the end of the
academic year (Williams et al. 2002). Clinical rotations have
inherent differences in the learning setting (e.g. outpatient
clinics, operating rooms and inpatient units), number and
Practice points
. Standardized faculty evaluation is an important part
of assessing the quality of clinical teaching within a
programme.
. The teaching encounter card presented is a feasible way
of collecting faculty evaluations from students, across
varying disciplines and learning environments.
. Encounter cards allow collection of enough data on an
individual teacher for a statistically valid evaluation of
their teaching abilities.
. Clinical teachers deserving particular recognition for
their excellence with students can be identified through
a standardized faculty evaluation process.
Correspondence: Erin Keely, Ottawa Hospital, Riverside Campus, Ottawa, ON K1H 7W9, Canada. Tel: 613 738 8400 X. 81941; fax: 613 738 8296
email: [email protected]
96 ISSN 0142–159X print/ISSN 1466–187X online/10/020096–5 � 2010 Informa Healthcare Ltd.
DOI: 10.3109/01421590903202496
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y C
DL
-UC
San
ta C
ruz
on 1
0/31
/14
For
pers
onal
use
onl
y.
degree of exposure to faculty, and complexity of patient
problems. The majority of studies have considered specific
learning settings (e.g. inpatient or ambulatory care) or
rotations (e.g. obstetrics, emergency medicine and internal
medicine), with few studies examining evaluation across
disciplines (Copeland & Hewson 2000; Zuberi et al. 2007).
Our goal was to standardize faculty evaluation across
all clerkship rotations using a practical instrument flexible
enough to meet the challenges of faculty evaluation including
inherent differences between varied learning environments,
unpredictable and varied patient encounters, different length
of rotations, and varying number of supervisors encountered.
This study expanded the use of a teaching encounter card
that was previously piloted in the Department of Obstetrics
and Gynaecology (Oppenheimer et al. 2006).
Our objectives were to implement an evaluation system,
using the teaching encounter card, across multiple rotations
in the clerkship and determine the feasibility, reliability and
validity of this evaluation tool in this expanded setting.
Methods
Setting
The University of Ottawa Medical School, with 112 anglo-
phone students per year, begins a 48-week clinical clerkship
in the third-year of a 4-year programme, using tertiary care
and community clinical settings. The core clinical rotations
include ambulatory care, anaesthesia, general internal medi-
cine (inpatient units), obstetrics and gynaecology, paediatrics,
psychiatry and general surgery. Clinical supervisors include
university faculty, community preceptors and residents/
fellows.
Instrument refinement
The items of the original faculty evaluation card were
generated from review of the literature on ideal clinical
teaching, review of other tools including those used in our
Emergency Medicine Department and others available in the
literature (Irby 1986; Irby et al. 1987). The original rating scale
included 10 key aspects of teaching, rated on a 4-point rating
scale anchored by the extent to which the student agreed
that the particular teaching behaviour had been provided,
and a global item on the value of the educational experience.
For construct validation purposes, students were asked
whether they would like to recommend this teacher for a
clinical teaching award (yes/no).
This faculty evaluation card was pilot-tested on the
obstetrics and gynaecology clerkship rotation from March to
September 2004 to assess its performance and feasibility
(Oppenheimer et al. 2006). Our pilot project confirmed the
acceptability and the face and content validity of the encounter
card (Oppenheimer et al. 2006). Despite the encouraging
results from the pilot, changes were made to reduce redun-
dancy and to increase the distinction between very good
and outstanding teachers (the right-hand side of the scale
was expanded from 4 points to 6 points). Other information
were added; about the clinical rotation, campus/location
of teaching encounter, the learning setting and length of
teaching exposure were added. The revised card used is
displayed in Figure 1. Students were not asked to identify
themselves to ensure student anonymity. Although this limits
the analysis that can be done, student candour is increased
if the student cannot be identified (Willet et al. 2007).
Implementation
From 1 December 2004 to 17 January 2007, all students
rotating through clerkship rotations, except for emergency and
family medicine (due to unique evaluation already in place)
were asked to complete a card on their clinical supervisor at
the end of the teaching encounter. Participation was voluntary
and anonymous. The cards were deposited in a drop box or
given to the administrative rotation coordinator.
Statistical analysis
Before analysis, all evaluation cards were reviewed for
completeness of data and for clear identification of the clinical
supervisor being evaluated. Incomplete cards were removed
from the analysis. In addition, to ensure a balanced design
with complete data on all items, only cards with all the rating
items completed were included in the analysis. Following
these steps, any faculty with less than two evaluation cards
were excluded from the analysis.
To study the performance of individual items on the card,
descriptive statistics for each item were calculated, as well as
item-total correlations. Ratings as a function of learning
environment and clerkship rotation were analyzed using
analysis of variance (ANOVA). To assess the reliability of
reported ratings, two types of reliability coefficients were
calculated: internal consistency of items across the instrument
was assessed using Cronbach’s alpha and a generalizability
coefficient was used to assess the reliability of the scale as
a whole. For the generalizability analysis, each card was
considered as the unit of measurement and was nested within
supervisor. Supervisor was treated as a between subject factor
and was crossed with items.
Composite scores were analyzed using two factors between
subject ANOVA to determine if mean ratings differed depend-
ing on whether the students thought their supervisor should
be nominated for a teaching award. For construct validation
purposes, differences in ratings were compared depending on
whether the faculty person was recommended for nomination
of a teaching award or not recommended. If the cards are
functioning as intended, then it is expected that ratings
should be higher for those faculty who are recommended
for a nomination than for those who are not.
The analyses including the descriptive statistics and
correlations were completed using SPSS 15.0 and G-string
(Bloch & Norman 2006) with UrGenova (Brennan 2001) was
used for the generalizability analysis.
Results
We collected a total of 5408 cards. 573 cards were rejected
because of failure to clearly identify the faculty person
Encounter cards for faculty evaluation
e97
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y C
DL
-UC
San
ta C
ruz
on 1
0/31
/14
For
pers
onal
use
onl
y.
being evaluated. Another 831 cards were flagged for incom-
plete data on the rating scale items. After removing these cards,
a further 33 supervisors were revealed to have only one rating
and cards from these supervisors were removed. This left a
total of 3971 cards available for 587 clinical supervisors. There
was an average of 6.8 cards per supervisor (median¼ 5,
range¼ 2 to 66 cards per supervisor).
Table 1 describes the mean scores for each item. For all
items, the full spectrum of responses from one to six was used,
indicating that students were willing to provide low ratings
to some supervisors. The items which were ranked the lowest
included orientation to the teaching session, organization of
teaching, assessment of knowledge and observation of skills.
The internal consistency of the items was relatively
high at 0.97 suggesting that scores on some of the items
may be redundant. This observation is supported by the high
item-total correlations of the items displayed in Table 1.
From the generalizability analysis, the facet accounting for
the largest proportion of variance is the rater nested with
supervisor facet (r:s) indicating that the ratings for the
supervisors varied a great deal between students (64% of the
variance). The g-coefficient for the instrument, generalizing
over the nine items and with mean of 7.3 cards per supervisor
was 0.64. To achieve a g-coefficient of 0.80, which would be
required for high stakes decisions, 17 cards/supervisor would
be required. Forty-eight of our 587 supervisors had more than
17 cards completed.
To determine if there were differences across learning
environments and rotations, a composite total score was
created by determining the average of the nine items for each
student. Table 2 displays the mean composite scores for
each of the learning environments. There was a significant
effect of learning environment (F (5,3243)¼ 4.08, p5 0.001).
The mean composite scores for learning sessions that occurred
in the operating room (which includes surgery and anaesthesia
encounters) were significantly higher than the mean compos-
ite ratings for learning sessions that occurred in the ward
(p¼ 0.001), clinic (p¼ 0.03) or ER (p¼ 0.002). Table 3
displays the mean composite scores for clerkship rotation.
There was a significant main effect of the clerkship rotation
(F (6,3964)¼ 11.13, p5 0.001), but post-hoc tests showed that
this significant effect occurred because ratings for anaesthesia
Please rate your clinical supervisor in providing:
Poor Fair Good Very Good Excellent Out-
standing
Enthusiasm for teaching
Orientation to teaching session objectives
Organization of teaching
Friendly learning environment
Some pearls of wisdom
Assessment of my knowledge
Observation of my skills
Helpful feedback
OVERALL valuable educational experience
oNseYmrof noitaulave ym fo noitelpmoC
oNseYdrawA gnihcaeT lacinilC a rof rehcaet siht etanimon ot ekil dluow I
Evaluation Card of Clinical Supervisor University of Ottawa –Faculty of Medicine
Date: ______________ Supervisor’s name: ____________________ Campus: __________ Group: ___________Rotation: ___________________ Half day ____ Whole day ___ Other _________________________________Session type: Ward ___ Hosp. clinic ___ OR ___ ER ___ Private office ___ Other _______________________
Please provide any comments on the back Thank you for completing this confidential form
Figure 1. Teaching encounter card.
Table 1. Scores on individual items.
Mean (�SD)
Item to totalscore correlation
coefficient
Enthusiasm for teaching 4.30 (1.47) 0.90
Orientation to teaching session
objectives
4.03 (1.25) 0.89
Organization of teaching 4.09 (1.27) 0.91
Friendly learning environment 4.53 (1.30) 0.85
Some pearls of wisdom 4.35 (1.26) 0.90
Assessment of my knowledge 4.05 (1.30) 0.90
Observation of my skills 3.98 (1.34) 0.86
Helpful feedback 4.12(1.33) 0.91
Overall valuable educational
experience
4.25 (1.30) 0.93
E. Keely et al.
e98
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y C
DL
-UC
San
ta C
ruz
on 1
0/31
/14
For
pers
onal
use
onl
y.
were higher than ratings for all other rotations (p5 0.001 to
p5 0.003).
The ‘‘nomination for teaching award’’ item was completed
on 2947 (83.7%) of cards of which 33% indicated a positive
response for wishing to nominate the faculty for a teaching
award. Of the 587 supervisors, 147 (25%) had at least half of
their completed cards suggesting nomination; whereas 180
supervisors (31%) did not receive any nominations for a
teaching award. There was a significant difference in compos-
ite score when there was a positive recommendation for
nomination (M¼ 5.29) versus no recommendation (M¼ 3.71,
F (1,2945¼ 1729, p5 0.001). This was consistent across clin-
ical rotations (Table 4).
Discussion
Standardization of faculty evaluation across disciplines and
learning environments using a practical tool is important
for faculty wide comparisons and development. Our scale
with eight items and a global rating is feasible to format on
a card and is easy to distribute, carry and return. Based on
the findings, we successfully implemented a revised faculty
evaluation card in the clinical clerkship that was flexible
enough to use across multiple learning environments.
Although use of a rating scale for faculty evaluation is not
itself unique, the widespread implementation across varying
learning environments and specialties has not previously been
reported.
The data collected from teaching encounter cards is
twofold. It allows programmes to ensure the quality of
teaching being provided, while also providing faculty mem-
bers with formative feedback. Thus a balance between the
purpose of the scale and the measurement properties is
needed. Despite the high correlations between scores on some
items, we feel that each item represents an important clinical
teaching behaviour that may provide supervisors with valuable
feedback. Others have shown that feedback to faculty on
individual teaching behaviours may result in individual
improvement (Maker et al. 2004).
The advantage of using a single tool across disciplines is
to distinguish excellent teachers within the faculty. The close
correlation between the combined score on the rating card and
nomination for a teaching award across clinical rotations
indicated our teaching encounter card is a valid means of
identifying the top and bottom rated teachers. For high stakes
decisions, a minimum of 17 evaluations per supervisor is
required.
While the majority of studies have used in a single teaching
setting, we have looked across learning environments. Those
teaching sessions that occurred in the operating room and
with anaesthesia faculty in general, were ranked higher than
others. Copeland and Norman (2000) implemented a standard
faculty evaluation form across departments and all levels of
trainees; however, the effect of different learning environ-
ments was not considered (Copeland & Norman 2000).
Focused student–faculty interaction, which occurs in outpa-
tient settings and operating rooms, may positively influence
teaching evaluations. A comparison of general internal med-
icine faculty evaluations between inpatient and outpatient
rotations demonstrated lower ratings in the inpatient setting
(Ramsbottm-Lucier et al. 1994). The perceived higher degree
of involvement with the supervisor in the ambulatory setting
accounted for a significant amount of the difference between
evaluations (Ramsey & Gillmore 1988). Further studies are
needed to determine the extent to which learning
Table 2. Encounter card results of global score bylearning environment.
Numberof cards
Number of facultiesevaluated Mean (�SE)
Ward 1443 375 4.16 (0.03)
Ambulatory care 578 211 4.16 (0.05)
OR* 683 216 4.36 (0.04)
Emergency room 255 124 4.04 (0.07)
Private office 162 67 4.23 (0.09)
Other 128 87 4.17 (0.10)
Total 3249 536 4.20 (0.02)
Note: *p5 0.01.
Table 3. Encounter card results of global scale byclerkship rotation.
Clerkship rotationNumberof cards
Number offaculties Mean (�SD)
Adult ambulatory 260 63 4.10 (1.24)
Anaesthesia* 442 84 4.58 (1.04)
Internal medicine 636 126 4.22 (1.18)
Obstetrics/Gynaecology 1650 123 4.10 (1.19)
Paediatrics 501 91 4.09 (1.19)
Psychiatry 160 46 4.10 (1.05)
Surgery 322 81 4.26 (1.14)
Total 3971 544 4.19 (1.18)
Note: *p5 0.01.
Table 4. Comparison of overall score (mean, SE) and nomination for a teaching award by clinical rotation.
AmbulatoryN¼ 198
AnaesthesiaN¼288
InternalmedicineN¼ 467
Obstetrics/Gynaecology
N¼ 1251Paediatrics
N¼386PsychiatryN¼ 108
SurgeryN¼ 249
TotalN¼ 2947
Would nominate for a
teaching award
5.46 (0.09) 5.36 (0.06) 5.31 (0.06) 5.29 (0.04) 5.16 (0.06) 5.16 (0.13) 5.21 (0.08) 5.29 (0.02)
Would not nominate for
a teaching award
3.72 (0.09) 4.17 (0.08) 3.68 (0.06) 3.61(0.04) 3.60 (0.07) 3.89 (0.11) 3.87 (0.09) 3.71 (0.02)
% Nominated 19.2 26.7 27.7 23.4 25.3 20.0 26.1 24.5
Encounter cards for faculty evaluation
e99
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y C
DL
-UC
San
ta C
ruz
on 1
0/31
/14
For
pers
onal
use
onl
y.
environments might influence the evaluations provided for
individual faculty members. For example, are ratings the same
for faculty in anaesthesia when the teaching encounter is in the
pain or preoperative clinic compared to the operating room?
The limitations of this study include the voluntary and
anonymous submission of the teaching encounter cards.
Although, it is essential to protect privacy of students for
candid completion, there was no way to collect the number
of trainees rating each faculty member, and it is possible
that some students rated one supervisor more than once.
There may be significant differences between those students
who chose to complete evaluations and those who did not.
The students also selected the supervisors for whom they
submitted cards. This selection bias may reduce the likelihood
of receiving encounter cards for ‘‘middle of the road’’ teachers
i.e. those who do not stand out as excellent or poor.
The variability in number of responses across rotations may
reduce the generalizability of our findings to specific rotations,
however, all were well represented. We only included clinical
clerks and not other levels of trainees. Further studies would
need to be done to ensure generalizability across all trainees.
It is important to study now whether the feedback provided
by these evaluation cards influences performance of individual
teaching faculty, changes to clinical rotations and faculty
satisfaction for their teaching efforts. The format that this
information is relayed to faculty must be carefully planned
and evaluated to encourage improvement and reduce the risk
for disengagement of teaching faculty with suboptimal scores
(Litzelman et al. 1998b).
Conclusion
We successfully implemented an anonymous, standardized
faculty evaluation card across a range of clerkship rotations.
This evaluation tool allows for individualized feedback to
faculty members, comparison across rotations and identifica-
tion of personal and programme areas of weakness and
strength.
Declaration of interest: The authors report no conflicts of
interest. The authors alone are responsible for the content and
writing of the article.
Notes on contributors
ERIN KEELY, MD FRCPC, is currently Chief, Division of Endocrinology
and Metabolism at the Ottawa Hospital. As a clinician-educator, she has
developed an interest in ambulatory care teaching.
LAWRENCE OPPENHEIMER is Division Head of Maternal–Foetal Medicine
in the Department of Obstetrics and Gynaecology and Director of the
University of Ottawa Clerkship programme.
TIMOTHY J. WOOD is currently the Manager, Research and Development
for the Medical Council of Canada and is an Adjunct Professor with the
Department of Medicine, University of Ottawa. He has a PhD in Cognitive
Psychology from McMaster University. His research interests are in
evaluation, licensure and expertise.
MERIDITH MARKS, MD, MEd, is a clinician educator with a particular
interest in faculty development and the assessment of interventions to
improve teaching quality.
References
Bloch R, Norman GR. 2006. G-String II, version 4.2. Available from:
www.fhs.mcmaster.ca/perd/download/
Brennan RL. 2001. Manual for urGenova.Iowa city. Iowa City, IA: Iowa
testing programs, University of Iowa.
Brennan BG, Norman GR. 1997. Use of encounter cards for evaluation
of residents in obstetrics. Acad Med 72:S43–S44.
Copeland HL, Hewson MG. 2000. Developing and testing an instrument
to measure the effectiveness of clinical teaching in an academic medical
center. Acad Med 75:11–16.
Dent MM, Boltri J, Okosun IS. 2004. Do volunteer community-based
preceptors value students’ feedback? Acad Med 79:1103–1107.
Irby DM. 1986. Clinical teaching and the clinical teacher. J Med Educ
61:35–45.
Irby DM, Gillmore GM, Ramsey PG. 1987. Factors affecting ratings
of clinical teachers by medical students and residents. J Med Educ
62:1–7.
Kernan WN, Holmboe E, O’Connor PG. 2004. Assessing the
teaching behaviors of ambulatory care preceptors. Acad Med
79:1088–1094.
Litzelman DK, Stratos GA, Marriott DJ, Lazaridis EN, Skeff KM. 1998b.
Beneficial and harmful effects of augmented feedback on physicians’
clinical-teaching performances. Acad Med 73:324–332.
Litzelman DK, Stratos GA, Marriott DJ, Skeff KM. 1998a. Factorial validation
of a widely disseminated educational framework for evaluating clinical
teachers. Acad Med 73:688–695.
Maker VK, Curtis KD, Donnelly MB. 2004. Faculty evaluations: Diagnostic
and therapeutic. Curr Surg 61:597–601.
Oppenheimer L, Keely E, Marks M. 2006. An encounter card to evaluate
teachers in clerkship. Med Educ 40:474–475.
Ramsbottom-Lucier MT, Gillmore GM, Irby DM, Ramsey PG. 1994.
Evaluation of clinical teaching by general internal medicine faculty
in outpatient and inpatient settings. Acad Med 69:152–154.
Ramsey PG, Gillmore GM, Irby DM. 1988. Evaluating clinical teaching
in the medicine clerkship: Relationship of instructor experience and
training setting to ratings of teaching effectiveness. J Gen Intern Med
3:351–355.
Richards ML, Paukert JL, Downing SM, Bordage G. 2007. Reliability and
usefulness of clinical encounter cards for a third-year surgical clerkship.
J Surg Res 140:139–148.
Smith CA, Varkey AB, Evans AT, Reilly BM. 2004. Evaluating the
performance of inpatient attending physicians. A new instrument for
today’s teaching hospitals. J Gen Intern Med 19:766–777.
Steiner IP, Franc-Law J, Kelly KD, Rowe BH. 2000. Faculty evaluation
by residents in an emergency medicine program: A new evaluation
instrument. Acad Emerg Med 7:1015–1021.
Willett RM, Lawson SR, Gary JS, Kancitis IA. 2007. Medical student
evaluation of faculty in student–preceptor pairs. Acad Med 82(10
Suppl.):S30–S33.
Williams BC, Litzelman DK, Babbott SF, Lubitz RM, Hofer TP. 2002.
Validation of a global measure of faculty’s clinical teaching perfor-
mance. Acad Med 77:177–180.
Zuberi RW, Bordage G, Norman GR. 2007. Validation of the SETOC
instrument – Student evaluation of teaching in outpatient clinics.
Adv Health Sci Educ Theory Pract 12(1):55–69.
E. Keely et al.
e100
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y C
DL
-UC
San
ta C
ruz
on 1
0/31
/14
For
pers
onal
use
onl
y.