2013
2013; 35: 720–726
Does the think-aloud protocol reflect thinking?Exploring functional neuroimaging differenceswith thinking (answering multiple choicequestions) versus thinking aloud
STEVEN J. DURNING1, ANTHONY R. ARTINO JR.1, THOMAS J. BECKMAN2, JOHN GRANER1,CEES VAN DER VLEUTEN3, ERIC HOLMBOE4 & LAMBERT SCHUWIRTH5
1Uniformed Services University of the Health Sciences, USA, 2Mayo Clinic, USA, 3Maastricht University, the Netherlands,4American Board of Internal Medicine, USA, 5Flinders University, Australia
Abstract
Background: Whether the think-aloud protocol is a valid measure of thinking remains uncertain. Therefore, we used functional
magnetic resonance imaging (fMRI) to investigate potential functional neuroanatomic differences between thinking (answering
multiple-choice questions in real time) versus thinking aloud (on review of items).
Methods: Board-certified internal medicine physicians underwent formal think-aloud training. Next, they answered validated
multiple-choice questions in an fMRI scanner while both answering (thinking) and thinking aloud about the questions, and we
compared fMRI images obtained during both periods.
Results: Seventeen physicians (15 men and 2 women) participated in the study. Mean physician age was 39.5þ 7 (range: 32–51
years). The mean number of correct responses was 18.5/32 questions (range: 15–25). Statistically significant differences were
found between answering (thinking) and thinking aloud in the following regions: motor cortex, bilateral prefrontal cortex, bilateral
cerebellum, and the basal ganglia (p5 0.01).
Discussion: We identified significant differences between answering and thinking aloud within the motor cortex, prefrontal
cortex, cerebellum, and basal ganglia. These differences were by degree (more focal activation in these areas with thinking aloud
as opposed to answering). Prefrontal cortex and cerebellum activity was attributable to working memory. Basal ganglia activity
was attributed to the reward of answering a question. The identified neuroimaging differences between answering and thinking
aloud were expected based on existing theory and research in other fields. These findings add evidence to the notion that the
think-aloud protocol is a reasonable measure of thinking.
Background
Clinical reasoning lies at the heart of any successful clinical
practice. Although there have been numerous definitions of
clinical reasoning, they all converge on the idea that clinical
reasoning entails cognitive operations that allow physicians to
observe, interpret and analyze information, and determine
diagnoses and further management (Higgs & Jones 2008).
Studying the phenomenon of clinical reasoning is no
simple matter, mainly because cognitive operations cannot be
observed directly and therefore must be inferred from
observable behavior. A major challenge with conducting
research on cognition pertains to the potentially confounding
factors that exist in the sequence from cognitive processes (i.e.
intermediate steps to the diagnosis or therapy) to the eventual
observable behaviors (i.e. the final diagnosis or therapy for a
patient). According to Ericsson and Simon (1987), ‘It is
important to note that any observable behavior used as data
for a thought process requires an explicit account of its relation
to the states of the thought processes and any mediating
additional cognitive processes’.
One commonly used method for studying cognition is the
think-aloud protocol. Currently, think-aloud protocol
Practice points
. Assessing clinical reasoning is challenging as cognitive
operations cannot be directly observed.
. A commonly used method for studying cognition is the
think-aloud protocol.
. Functional MRI introduces the possibility of examining
the relationship between think-aloud data and brain
activation patterns.
. Our exploratory findings provide some neuroimaging
evidence that the think-aloud protocol can help educa-
tors with assessing cognition
Correspondence: Steven J. Durning, MD, PhD, FACP, Professor of Medicine and Pathology, Uniformed Services University of the Health Sciences,
4301 Jones Bridge Road, Bethesda MD 20814-4799, USA. Tel: þ1301-295-3603; fax: þ1301-295-5792; email: [email protected]
720 ISSN 0142–159X print/ISSN 1466–187X online/13/90720–7 � 2013 Informa UK Ltd.
DOI: 10.3109/0142159X.2013.801938
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
onas
h U
nive
rsity
on
09/0
6/13
For
pers
onal
use
onl
y.
methodology, used either during the task or retrospectively
after the task, is seen as an optimal methodology to capture
thought processes (Ericsson 2006). Think-aloud protocols
have been used to study clinical reasoning. Nonetheless,
there has been an extensive debate about the validity of think-
aloud methodology. Emerging methods, such as functional
magnetic resonance imagining (fMRI), may help resolve the
ongoing debate.
There are two general categories of confounding factors
during think-aloud protocols, reactiveness and nonverdicality
(Russo et al. 1989). Reactiveness pertains to the effect that
verbalization has on the underlying cognitive processes.
According to information-processing theory (Newell & Simon
1972), recently acquired information is stored and processed in
the short-term memory (STM) as a series of consecutive ‘steps’,
which eventually lead to the outcome of the problem-solving
process. Asking someone to verbalize his/her thoughts places
additional requirements on the STM. In the simplest case, this
may prolong the response time because the act of verbaliza-
tion adds to the time required for the ‘internal voice’ or
thinking (Ericsson et al. 2006). In more complex cases,
verbalization could interrupt the ‘internal voice’ in the STM,
cause loss of information from the STM, and fundamentally
change the cognitive processes. Despite this theoretical
concern and lack of empirical evidence, many believe that
‘the verbal protocol procedure slows down the process slightly
but does not change it fundamentally’ (Russo et al. 1989).
The second source of confounding is nonveridicality,
which is based on differences between the information that
is processed during unprompted versus think-aloud situations.
Errors of omission and commission are relevant to nonveridi-
cality and are analogous to Messick’s construct underrepre-
sentation and construct-irrelevant variance (Messick 1989).
Thus far, reactiveness and nonveridicality have been studied
mainly by comparing different types of think-aloud procedures
or by comparing the performances with think-aloud proced-
ures that use instruction versus no instruction (Russo et al.
1989). Notably, all these studies have investigated various
aspects of translating cognitive processes to observable
behaviors (Ericsson et al. 2006). However, we could not
identify previous research that uses functional neuroimaging
and think-aloud protocols to study clinical reasoning.
The recent emergence of fMRI introduces the possibility to
examining the relationship between think-aloud data and
visualization of the artifacts of thought processes within
specific brain regions. Previous work (Rosen et al. 2000) has
compared neuroimaging differences that occur when thinking-
aloud versus vocalizing silently with a stem completion task
(three-letter words completed into a normal English word).
This study found differences between these activities for both
word retrieval and comparison within the primary motor
cortex, frontal operculum, and the dorsolateral cortex; these
differences were attributable to the motor activation associated
with thinking aloud (speaking) versus silent vocalization. Also
identified were asymmetric left thalamus and putamen activa-
tions with speaking versus thinking. Overall, the authors
concluded that fMRI findings associated with speaking versus
thinking differ only with respect to the motor areas, although
the differential activations of the putamen and thalamus cannot
be explained from motor activity alone. In another study,
Klasen et al. (2008) used fMRI to identify hypothesized brain
activations that occur with emotional states and verbalized
feelings among subjects playing computer games. They found
good matches between the fMRI data and behavioral and
experiential content, and concluded that their results added to
the validity of verbal protocols. Based on all these findings, it
would seem axiomatic that anticipated functional neuroima-
ging changes occur during clinical reasoning, though we could
find no published research regarding the use of fMRI and
think-aloud protocols for studying clinical reasoning among
physicians.
Many diagnosis and treatment mistakes are due to pre-
ventable cognitive errors (Brennan et al. 1991; Croskerry
2003). Indeed, physician certification and recertification
attempt to measure cognitive ability through the use of
multiple-choice questions (MCQs) that pose diagnostic or
treatment questions. These questions are the ‘gold standard’
for physician competence because they have undergone
rigorous psychometric analysis and have reliability and validity
evidenced that is based on thousands of test-takers. We sought
to determine if the functional neuroimaging of thinking
(answering) differed from thinking aloud about validated
MCQs that assess the construct of clinical reasoning. The
identification of minimal differences could provide physio-
logical, criterion-related validity evidence for using the think-
aloud protocol in educational research.
Methods
Participants
Participants were board-certified internal medicine attending
physicians with faculty appointments at the Uniformed
Services University of the Health Sciences (USU). Exclusion
criteria were the presence of shrapnel or surgical metal
devices, inability to complete an fMRI due to anxiety or
claustrophobia, taking calcium channel blockers, which can
impact regional blood flow, or pregnancy. The study was
approved by the Institutional Review Boards of the Uniformed
Services University and the Walter Reed Army Medical Center
and participants provided informed consent.
Measurements
Multiple-Choice Questions (MCQs). We used validated MCQs
from the American Board of Internal Medicine (ABIM), which
is the organization responsible for the certification and
recertification of internal medicine physicians in the United
States, and MCQs from the National Board of Medical
Examiners (NBME), to assess physician expertise. The specific
NBME items chosen were those on the United States Medical
Licensing Exam (USMLE) Step 2 Clinical Knowledge (CK)
Examination. All MCQs from both the ABIM and NBME are
validated on thousands of subjects and are then modified, as
needed, to optimize their psychometric performance (Melnick
2009). Given the small number of items answered in the
scanner, we selected questions from two fields in medicine:
cardiology and rheumatology. The MCQs selected required the
Does the think-aloud protocol reflect thinking
721
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
onas
h U
nive
rsity
on
09/0
6/13
For
pers
onal
use
onl
y.
integration and synthesis of data. Participants answered a total
of 32 questions: 16 NBME items (United States Medical
Licensing Exam Step 2 Clinical Knowledge items) and 16
ABIM items (Maintenance of Certification [MOC] MCQ) that are
currently being used to recertify physicians.
We selected questions that fit on a single screen and
contained only words (i.e. no images), given limitations on
screen size and resolution. The selected items also had
favorable discrimination, such that high-performing individuals
answered the hard questions correctly and only the lowest
performing individuals missed the easy items. In addition, the
MCQ format (participants pushed handheld buttons for
answer options ‘A’ to ‘E’) made them ideal for use in the
fMRI scanner, which eliminated the need for participants to
speak, as jaw motion impairs fMRI image interpretation. As
participants were asked to remain completely still in the fMRI
scanner, formally assessing for evidence of subvocalization
was not possible. The questions selected focused on diagnos-
tic reasoning; they all posed a vignette and then asked
participants ‘What is the most likely diagnosis’ or a related
diagnostic question.
fMRI process
Prior to entering the fMRI scanner, participants were formally
trained in the think-aloud procedure using standard guidelines
(Ericsson & Simon 1984). Thinking aloud involves stating
whatever comes to mind (all the information that passes
through a subject’s mind) as one works through a task
(Ericsson et al. 2006). Following this training and completion
of a pre-fMRI questionnaire, participants entered the fMRI
scanner.
The details of the fMRI data acquisition and assessment
methodology are described in Appendix 1. Subjects were
scanned on a 3T Signa MRI scanner (General Electric,
Milwaukee, WI) with a 32-channel head coil. The 32 questions
were presented to participants in a random order.
Each MCQ was projected to participants in the fMRI scanner
in three phases. In the first phase, the stem (question)
appeared (‘reading’ phase). Each question stem ended with
‘what is the most likely diagnosis?’ or a related diagnostic
question. The participants were given 60 s to read the stem and
push any button to move on to the answer options (the second
or ‘answering’ phase). Subjects were then given 7 s to choose
an answer option using the finger buttons. Once the answer
options were presented the subjects could not return to the
previous (‘question’) screen. After the answering phase the
final phase (‘think-aloud’ phase) was entered. Here, partici-
pants were instructed to ‘think aloud’, without speaking (jaw
motion would compromise functional neuroanatomic images),
about how they arrived at their chosen answer (‘how did you
establish the diagnosis for this item’; modified think-aloud).
The ‘think aloud’ phase lasted 14 s.
There was no formal resting period between questions.
Instead, the ‘reading’ phase was used as the baseline for
making comparisons with the answering and thinking aloud
phases, which were the focus of the current study. We
believed that the reading phase would represent a better
baseline for comparison than an unguided ‘rest’ period,
yielding more meaningful, task-specific findings. This was
also done to attempt to control for reading – we were
interested in answering and thinking about (thinking aloud) on
one’s answers. We compared ‘answering phase’ images with
‘think-aloud phase’ images to ascertain for the presence of
functional neuroimaging differences with answering and
thinking aloud on these ‘gold standard’ items used for licensing
physicians in the United States.
The participants also underwent a formal think-aloud
protocol immediately after the imaging session. In this session
participants’ utterances to the identical items in the scanner
were captured. This formal think-aloud session was done to
help validate the pre-fMRI think-aloud training and the thought
processes elicited while going through the think-aloud phase
in the scanner.
Data analysis
All fMRI data were processed using the AFNI software package
(Cox 1996). Neuroimaging activation analysis was performed
using a general linear model (GLM) approach concatenating
the four data sets for each subject. Hemodynamic response
estimates were modeled for the answering and thinking aloud
question phases. As a first level of analysis, contrasts were
made between the ‘answer’ and ‘think-aloud’ significance
estimates in each voxel for each subject. A group analysis
comparing this contrast across all 17 subjects was also
performed using a linear mixed-effects modeling approach.
For determining functional neuroimaging differences with
thinking (answering) and thinking aloud, we compared the
answer phase with the think-aloud phase. This was done to
capture the hemodynamic differences that might be seen
when participants decided on an answer (answering phase)
versus when they thought about their answer (think-aloud
phase).
Results
Seventeen physicians (15 men and 2 women) participated in
the study. Mean physician age was 39.5� 7 (range: 32–51
years). Two participants’ fMRI data sets were dropped due to
excessive motion in the scanner and thus the final cohort was
15 physicians. The mean number of correct responses was
18.5/32 questions (range 15–25). Participant specialties were: 9
general internal medicine, 3 cardiology, 2 pulmonary/critical
care, 1 gastroenterology, 1 infectious diseases, and 1
hematology-oncology. All participants were board certified in
internal medicine, and all but one in their respective
subspecialty (a cardiology fellow).
Functional neuroimaging differences were exploratory,
examining all areas of the brain between answering (thinking)
and thinking aloud phases are presented below (Table 1 and
Figure 1). Statistically significant differences were found
between answering (thinking) and thinking aloud in the
following brain regions: motor cortex, bilateral prefrontal
cortex, bilateral cerebellum, and the basal ganglia (p5 0.01).
These changes were also seen when conducting the analysis
separately for easy and hard items.
S. J. Durning et al.
722
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
onas
h U
nive
rsity
on
09/0
6/13
For
pers
onal
use
onl
y.
There was more activation with answering compared to
think-aloud for the following areas: motor cortex (which was
expected as participants pushed the buttons in the answering
phase but not in the think-aloud phase), bilateral prefrontal
cortex, bilateral cerebellum, and the caudate nucleus and
putamen. Brain activity was greater with thinking aloud versus
answering for the following areas (all p5 0.01): occipital
cortex, precuneus (PCC).
Formal think-aloud data obtained following exit from the
scanner revealed that subjects were properly trained in the
think-aloud procedure by the rich descriptions provided and
spontaneity of their responses. Respondents did not state that
they forgot items and stated that they were able to focus on the
task in the fMRI scanner.
Discussion
To our knowledge, this is the first study to demonstrate
differences in functional neuroimaging between answering
versus thinking aloud about validated MCQs among board-
certified physicians. Specifically, we identified differences
between answering versus thinking aloud that pertained to
the prefrontal cortex, basal ganglia, caudate, putamen, occipi-
tal cortex, and cerebellum. The differences between answering
and thinking aloud occurred within the same, but not between
different, brain regions.
These findings provide new insights into memory retrieval
and clinical reasoning among physicians, along with prelim-
inary and novel physiological, criterion validity evidence for
using the think-aloud protocol in educational research. What
follows is a detailed description of how the neuroimaging
findings differed for answering versus thinking aloud and how
our findings build upon previous neuroimaging research.
Prefrontal cortex activation
We identified statistically significant increases in the prefrontal
cortex and the basal ganglia for answering questions versus
thinking aloud. The prefrontal cortex is widely believed to be
involved in executive functions, such as decision making
(Stuss & Knight 2002), which is consistent with our finding
of prefrontal activation with silent thinking. The prefrontal
cortex has also been implicated as a potential area for
working memory processing (Baddeley 1986, 1992;
Sammer et al. 2006). With reviewing the question a second
time during thinking aloud, it is plausible that less working
memory resources would be activated resulting in a greater
signal within this area with answering versus thinking aloud
(i.e. a more focal signal in the area was activated with thinking-
aloud). Our research expands upon the study by Rosen et al.
(2000), which did not identify changes in the prefrontal cortex
with answering versus thinking aloud. Our findings may differ
from those of Rosen et al because their study involved the
simple task of retrieving and comparing words, whereas our
study involved a much more intricate task of solving complex
clinical problems through the use of carefully validated MCQs.
Caudate and precuneus
The caudate and putamen are structures in the basal ganglia
that have recently been implicated for their roles in learning
and memory. We found increased activation in these areas
with answering versus thinking aloud on items, which is
consistent with prior research (Graybiel 2005). Similarly, recent
fMRI studies (Cavanna & Trimble 2006) implicate the
precuneus in both self-referential goal-directed actions and
memory retrieval. Additionally, the striatal regions are also rich
in dopaminergic neurons, which have been shown to be
central to reward mechanisms (Koepp et al. 1988). The
increased activation in these regions suggests that the act of
answering the question may produce a reward signal, which is
likely in this group of highly motivated individuals accustomed
to being tested. Likewise, recent fMRI work has shown
activation of the medial prefrontal cortex and the precuneus
with reasoning linked with emotionally salient (or ‘hot’
reasoning; Shaefer & Rotte 2007) which is consistent with
our study of expert physicians answering examination ques-
tions used for licensure in the fMRI scanner. The precuneus
has also been implicated as an area of working memory
(Sammer et al. 2006).
Table 1. Statistically Significant Contrasts (comparisons) with Functional Neuroimaging*.
MNI Coordinates**
Comparison Location X Y Z Corrected p value
Answer vs. think-aloud
occipital cortex �6 71 6 50.01
L basal ganglia 18 �14 6 50.01
R basal ganglia �14 �6 2 50.01
R PFC 40 �26 22 50.01
L PFC �52 �8 28 50.01
L cerebellum 19 52 21 50.01
R cerebellum �26 62 �35 50.01
Precuneus 2 48 32 50.01
PFC, prefrontal cortex
MNI, Montreal Neurological Institute; brain coordinates that differed between answering and reflecting (by X, Y, Z spatial brain dimensions)
Does the think-aloud protocol reflect thinking
723
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
onas
h U
nive
rsity
on
09/0
6/13
For
pers
onal
use
onl
y.
Occipital cortex
Another significant finding is that the occipital cortex was more
activated during think aloud versus answering items. One
potential explanation is that our physician experts were
visualizing actual clinical scenarios while reflecting on their
answers. Another potential explanation is that different back-
ground colors for items were used as a cue to participants that
they were in the think-aloud period of each MCQ. Different
colors have been shown to trigger various changes in brain
activation (Beuachamp, 1999). Also, the finding of greater mid
occipital gyrus activation while answering versus thinking
aloud may be due to its potential role in working memory
(Baddeley 1986, 1992). We would anticipate that repetitive,
immediate processing of how one arrived at an answer on a
question they just answered (thinking aloud after answering an
item) would result in less excitation of working memory
(because pathways have been just established) than with the
initial answer. This hypothesis, however, should be tested in
future research as it is also possible that activities in the STM
are different between thinking aloud and actual thoughts,
which would have negative consequences for both the
verdicality and reactiveness of this method.
Cerebellum
Our finding of increased cerebellum activation with answering
items versus thinking aloud is also consistent with emerging
work implicating the cerebellum as a working memory area
(Desmond 2003). Similar to other fields (Baddeley 1986;
Anderson 2010), and as noted above, we found several areas
that may be involved in working memory with clinical
reasoning tasks.
We would expect that the act of answering (versus thinking
aloud) would involve more working memory resources
(prefrontal cortex, cerebellum) and also increased activation
of the striatal regions that are involved in both answering and
reward; it is also plausible that the increased activation of the
precuneus is due to working memory (Sammer et al. 2006).
More specifically, we would expect that activation of working
memory resources would be less (activated) upon thinking
aloud on the same item, immediately after answering the item,
as learning could occur between seeing the item the first and
the second time. It is also possible that these changes reflect
differences in verdicality of the method though we believe that
our hypotheses are consistent with current theory thus
providing some evidence, which should, however, be con-
firmed in larger studies. Our expectation is consistent with
emerging theory on complex cognition: many cognitive
functions, such as working memory, can be performed by
more than one area (Just & Varma 2007).
The occipital differences are most consistent with changes
in the screen color, and motor region differences are attrib-
utable to the participants physically pushing buttons for their
A–E answers. Indeed, motor differences were expected and
this finding supports the validity of our approach.
Furthermore, participants underwent a formal think-aloud
procedure immediately after leaving the fMRI scanner and the
time for thinking aloud as well as the quality of their
spontaneous utterances (remembering the items and the
quality of elicited thought processes) provides some additional
evidence to support the adequacy of think-aloud training and
our modified think-aloud approach.
The observed functional neuroimaging differences are
consistent with current understanding of clinical reasoning
theory and provide some evidence to support the use of think-
aloud protocols for measuring cognitive processes. The
presumed bias associated with ‘thinking about one’s thinking’
for purposes of these protocols may be inaccurate (Ericsson &
Simon 1984; van der Vleuten & Newble 1995; Ericsson et al.
Figure 1. Regional differences with answering versus
thinking aloud: panel A: motor cortex, B: bilateral prefrontal
cortex, C: bilateral cerebellum, and the D; basal ganglia
showed more activation while answering (red) the multiple-
choice questions than while thinking aloud upon arriving at a
chosen answer. Regions of the occipital cortex and precuneus
showed greater activation (blue) while thinking aloud than
while answering.
S. J. Durning et al.
724
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
onas
h U
nive
rsity
on
09/0
6/13
For
pers
onal
use
onl
y.
2006). Our findings would support the practice of adding
think-aloud protocols to tools that measure behaviors, such as
post-encounter forms in Objective Structured Clinical Exams or
answers to MCQs. Our preliminary findings would also
support the continued use of think-aloud protocols alone for
exploring clinical reasoning. This is encouraging as think-
aloud protocols are a feasible and relatively inexpensive
especially when compared with methods such as fMRI.
This study has several important limitations. First, it may
have had insufficient power to detect fMRI differences in other
brain regions. The sample size, however, would generally be
considered ‘acceptable to robust’ in the fMRI literature.
Alternatively, we may not have detected differences in other
brain regions due to the complexity of the assessed construct.
For example, brain imaging research investigating creativity
(and arguably, at least in some cases, diagnosing and treating a
patient is a creative event) has found that it is a challenging
phenomenon to map likely because creative performance is
difficult to operationalize in the experimental setting (Fink
et al. 2006). Likewise, we know surprisingly little about the
brain processes that generate insights in problem solving (Luo
2006). Also, recent fMRI work suggests that each cortical area
can perform multiple cognitive functions (Just & Varma 2007).
Second, we may have failed to find fMRI differences due to
range restriction from sampling only experts, though our wide
performance means and standard deviations for items would
argue against this. However, future work should consider
comparing experts to novices. Third, we did not hear the
participants’ utterances during the modified think-aloud in the
fMRI scanner. Nonetheless, the high quality of their think-
aloud utterances immediately after leaving the scanner reduces
the potential impact of this limitation. Fourth, the reasoning
associated with answering MCQs may be quite different than
reasoning applied to seeing actual patients. Future work could
consider having participants view videotapes of patient–
physician encounters. Fifth, there may be differences with
thinking aloud and our modified method used in this study.
Our exploratory findings support emerging fMRI literature
(Jeste & Harris 2010; Riess 2010), and the fMRI data provide
evidence that think-aloud protocols can help educators with
assessing complex cognitive processes, such as diagnostic
reasoning. This is an encouraging finding, as think-aloud
protocols are far more feasible than conducting fMRI studies at
the current time. Ultimately, we believe the measurement
approach of combining neuroimaging and think-aloud data
with more traditional assessment measures offers great poten-
tial for advancing our understanding of diagnostic reasoning.
Declaration of interest: The authors report no conflicts of
interest. The authors alone are responsible for the content and
writing of the article.
This study was supported by grant funding from the
American Board of Internal Medicine Foundation.
The views expressed in this article are those of the authors
and do not reflect the views of the Department of Defense or
other federal agencies.
Notes on contributors
STEVEN J. DURNING, MD, Phd, is Professor of Medicine and Pathology
who directs an introduction to clinical reasoning course and co-directs the
Long Term Career Outcome Study at USU.
ANTHONY R. ARTINO, Jr., Phd is Associate Professor of Medicine and
Preventive Medicine & Biometrics. He has a PhD in educational psychology
and co-directs the Long Term Career Outcome Study at USU.
THOMAS J. BECKMAN, MD is Professor of Medicine and Medical Education
at the Mayo Clinic.
JOHN GRANER, Phd is an fMRI analyst at the Uniformed Services
University.
CEES VAN DER VLEUTEN, Phd is Professor of Education at Maastricht
University.
ERIC HOLMBOE is the Senior Vice President and Chief Medical Officer for the
American Board of Internal Medicine.
LAMBERT SCHUWIRTH, MD, Phd is Professor of Medical Education at
Flinders University.
Glossary of terms in medicaleducation
Think-aloud protocol: A technique whereby participants
state (think aloud) whatever they are thinking, feeling, or
doing as they complete a task.
References: Ericsson, K., & Simon, H. (1993). Protocol
Analysis: Verbal Reports as Data (2nd ed.). Boston: MIT
Press.
Ericsson KA, Charness N, Feltovich PJ, and Hoffman RR
(2006). The Cambridge Handbook of Expertise and Expert
Performance. New York: Cambridge University Press.
Functional magnetic resonance imaging or func-
tional MRI (fMRI): A procedure that measures brain
activity by detecting changes in blood flow due to energy
use by regions of neurons.
Reference: Huettel, S. A.; Song, A. W.; McCarthy, G. (2009),
Functional Magnetic Resonance Imaging (2 ed.),
Massachusetts: Sinauer.
References
Anderson JL. 2010. Cognitive psychology and its implications. 7th ed.
New York: World Publishers.
Baddeley AD. 1986. Working memory. Oxford: Clarendon Press.
Baddeley A. 1992. Working Memory. Science 255:556–559.
Beuachamp MS, Haxby JV, Jennings JE, DeYoe EA. 1999. An fMRI version
of the Farnsworth-munsell 100-hue test reveals multiple color-selective
areas in human ventral occipitotemporal cortex. Cerbral Cortex
9:257–263.
Brennan TA, Leape LL, Laird NM, et al. 1991. Incidence of adverse events
and negligence in hospitalized patients: Results of the Harvard Medical
Practice Study. N Eng J Med 324:370–376.
Cavanna AE, Trimble MR. 2006. The precuneus: A review of its functional
anatomy and behavioural correlates. Brain 129:564–583.
Cox RW. 1996. AFNI: Software for analysis and visualization of functional
magnetic resonance neuroimages. Comput Biomed Res 29:162–173.
Croskerry P. 2003. The importance of cognitive errors in diagnosis and
strategies to minimize them. Acad Med 78:775–780.
Desmond JE, Chen SH, DeRosa E, Pryor MR, Pfefferbaum A, Sullivan EV.
2003. Increased frontocerebellar activation in alcoholics during verbal
working memory: An fMRI study. Neuroimage 19:1510–1520.
Does the think-aloud protocol reflect thinking
725
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
onas
h U
nive
rsity
on
09/0
6/13
For
pers
onal
use
onl
y.
Ericsson KA, Simon HA. 1984. Protocol analysis: Verbal reports as data.
1984. Cambridge, MA: MIT Press.
Ericsson KA, Simon HA. 1987. Verbal reports on thinking. In: Faerch C,
Kasper G, editors. Introspection in second language research.
Clevedon, England: Multilingual Matters.
Ericsson KA, Charness N, Feltovich PJ, Hoffman RR. 2006. The Cambridge
Handbook of expertise and expert performance. New York: Cambridge
University Press.
Graybiel AM. 2005. The basal ganglia: Learning new tricks and loving it.
Curr Opin Neurobiol 15:638–644.
Higgs J, Jones MA. 2008. Clinical decision making and multiple problem
spaces. In: Higgs J, Jones MA, Loftus S, Christensen N, editors. Clinical
reasoning in the health professions. 3rd ed. Amsterdam: Elsevier.
Butterworth Heinemann.
Jeste DV, Harris JC. 2010. Wisdom – A neuroscience perspective. JAMA
304(14):1602–1603.
Klasen M, Zvyagintsev M, Weber R, Mathiak KA, Mathiak K. 2008. Think
aloud during fMRI: Neuronal correlates of subjective experience in
video games. Lect Notes Comput Sc 5294:132–138.
Koepp MJ, Gunn RN, Lawrence AD, Cunningham VJ, Dagher A, Jones T,
Brooks DJ, Bench CJ, Grasby PM. 1988. Evidence for striatal dopamine
release during a video game. Nature 393:266–268.
Melnick DE. 2009. Licensing examinations in North America: Is external
audit valuable? Med Teach 31:212–214.
Messick S. 1989. Meaning and values in test validation: The science and
ethics of assessment. Educ Res 18:5–11.
Newell A, Simon HA. 1972. Human problem solving. Englewood Cliffs, NJ:
Prentice Hall.
Russo JE, Johnson EJ, Stephens DL. 1989. The validity of verbal protocols.
Mem Cognition 17:759–769.
Rosen HJ, Ojemann JG, Ollinger JM, Petersen SE. 2000. Comparison of
brain activation during word retrieval done silently and aloud using
fMRI. Brain and Cognition 42:201–217.
Riess H. 2010. Empathy in medicine – A neurobiological perspective. JAMA
304(14):1604–1605.
Stuss D, Knight RT. 2002. The frontal lobes. New York: Oxford University
Press.
Van der Vleuten CPM, Newble DI. 1995. How can we test clinical
reasoning? The Lancet 345:1032–1034.
Appendix 1: fMRI data acquisition and analysis methodology
Data acquisition
Acquisitions were performed using an echo-planar imaging
(EPI) sequence of 40 contiguous sagittal slices per brain
volume (TR¼ 2000 ms, TE¼ 25 ms, flip angle¼ 60�, slice
thickness¼ 4.0 mm). In-plane resolution was 3.75� 3.75 mm
(64� 64 voxels). An fMRI task presentation of the 32 questions
was created using E-Prime software (Psychology Software
Tools, Inc.) and displayed via a goggle system (Nordic
NeuroLab Inc., Milwaukee, WI) while each participant was
in the fMRI scanner.
The questions were presented in a random order over four
fMRI acquisition runs. Each run contained eight questions. The
exact length of each run varied, depending on the amount of
time each subject took to progress through the reading and
answering phases of each question. Mean run length (�stand-
ard deviation) was 392� 62 s. Subjects underwent pre-training
to acquaint them with the method and layout of question
presentation and the correct use of the buttons corresponding
to answer options A to E. In addition to the functional imaging,
a high-resolution T1-weighted image was acquired for ana-
tomical reference (three dimensional GRE; TR¼ 6.6 ms,
TE¼ 2.5 ms, flip angle¼ 12 degrees). This anatomical T1
image consisted of 312 sagittal slices, a slice thickness of
0.6 mm and an in-plane resolution of 0.468� 0.468 mm
(512� 512 voxels).
Data analysis
Image preprocessing included typical fMRI procedures:
removal of the first three brain volumes (6 s) from each 4D
time series, slice-time correction, motion correction, registra-
tion to the T1 anatomical image, smoothing with an 8 mm full-
width-half-max Gaussian kernel, and conversion of voxel
values to voxel-wise percent-change-from-mean rather than
absolute intensity.
The ‘answer’ times varied from question to question and
were modeled with a gamma-variate function with variable
duration and variable amplitude (amplitude variation was
based on duration variation). The ‘think-aloud’ time was
constant at 14 s and was modeled with a non-variable gamma-
variate. The GLM analysis was used to determine the signifi-
cance of these model time-courses within each voxel, using
the ‘reading’ phase as a ‘baseline.’ Time-courses associated
with preprocessing motion correction parameters were also
included in the GLM regression analysis to regress out any
further image intensity changes due to motion.
S. J. Durning et al.
726
Med
Tea
ch D
ownl
oade
d fr
om in
form
ahea
lthca
re.c
om b
y M
onas
h U
nive
rsity
on
09/0
6/13
For
pers
onal
use
onl
y.