erik pukinskis "embodied programming"
TRANSCRIPT
-
8/14/2019 Erik Pukinskis "Embodied Programming"
1/14
Embodied ProgrammingErik Pukinskis
UC San Diego
Cognitive Science
Abstract
Past research into the nature of programming has focused on programmers as information
processors, but there is a recent move in Cognitive Science towards an embodied view of
cognition. This work presents results of a naturalistic observation of student pair-programmers
which show that programming is, in fact, an embodied activity. Analysis shows that the body isused to enact code, and to carry out complex pointing simultaneously in multiple semiotic fields,
and that programmers use different modes afforded by the mouse cursor and selection to perform
a wide variety of points.
Introduction
What is programming? We know that it is carried out primarily by highly educated or skilled
programmers and that it results in the software systems that run everything from the pacemakers
that keep our hearts beating to the 17 mile wide Large Hadron Collider. But how exactly does
programming happen? What are the cognitive systems at work in the construction of software?
Studies of programming have been conducted, but they have typically been thought of as an
information processing activity. Ko and Meyers (2005) developed a model of software errors
which breaks down software error production into four layers: specifications, programmer,
programming system and programmer. They call errors in the programmer layer cognitive
breakdowns which can be attributed to problems with knowledge, attention, and strategies. The
types of breakdowns they describe range from issues with ambiguous signs to programmers
-
8/14/2019 Erik Pukinskis "Embodied Programming"
2/14
choosing the wrong rule to apply or using a faulty model. This method of describing cognition
falls firmly in the information processing model, a rich model which certainly bears some fruit,
and Ko and Meyers arrive at many valuable insights about programming.
However, in recent years there has been in increasing interest within Cognitive Science in the
body, and Ko and Meyers ignore the body almost entirely. The basic embodiment hypothesis is
that the body plays a critical role in cognition, but there are as many formulations of the
embodiment hypothesis as there are researchers interested in the phenomenon. These range
from the notion that our language and cognition is grounded in embodied metaphors (Lakoff and
Johnson, 1980) to more radical beliefs that much of actual online cognition occurs outside the
head (Hutchins, 1995; 2007). Many other cognitive scientists (Gibbs, 2006; Noe, 2004; Hurley,
1998; Lakoff & Nunez, 2000) have pushed the notion that the body is an important part of the
cognitive apparatus.
In addition, Andy Clark (2007) has proposed that the embodiment hypothesis is particularly
applicable in Human-Computer Interaction, because of what he calls "radical embodiment."
Clark cites several new pieces of research which support this notion. Maravita and Iriki (2004)
performed a set of experiments recording from bimodal neurons that respond both to activity in a
somatosensory receptive field (sRF) and to a visual receptive field (vRF) adjacent to the sRF,
which remain anchored to the body as it moves. Trials showed two distinct types of bimodal
cells, both which show an expansion of the vRF following tool use. The vRF of the "distal" type
initially included only the hand, but after learning to use a rake in a reaching task expanded to
include the length of the tool. The "proximal" type cells responded initially to the area within
reach of the body, and expanded to include the area within reach of the tool. Further
experiments involved training in a virtual reality situation, where the monkeys were prevented
from viewing their hand or the rake directly, instead viewing them via a video monitor. In this
situation the vRF shifted to include the on-screen representations of the hand and the tool. The
explanation Maravita and Iriki offer is that the body schema is being reused for the tool and tool
activity.
-
8/14/2019 Erik Pukinskis "Embodied Programming"
3/14
In addition, there is quite a bit of research which supports the idea that the body is playing a
critical role in cognitive activities. Hutchins (2007) describes--by way of Spivey (2007)--a
finding from Glucksberg (1964). Glucksberg's study required participants find a clever solution
for mounting a candle on a wall with a box of tacks. What is noteworthy about this study is that
it was found that participants often were observed to be handling the box immediately before
they realized that the solution was to use the box itself to hang the candle. This suggests that the
physical handling of the box facilitated the insight. And Hutchins himself (2007) describes a
situation in which a navy shipman's hands are doing cognitive work that allows him to discover a
missing term in a calculation.
Indeed, the notion that cognition is offloaded onto external representations is not new (Scaife and
Rogers, 1996; Hutchins, 1995), and there is no reason why the body could not be used for such a
purpose. Indeed, it seems likely that the body is recruited in programming, the big question is
how?
Method
The study presented here utilizes naturalistic observation techniques borrowed from the practice
of cognitive ethnography. A programming class at UC San Diego was identified as a potential
source of data. The class is specifically focused on debugging. Students are assigned readings
from a software development text each week. In class they take a quiz on the readings and are
given a short lecture. After the lecture, they form pairs and are given a small, self-contained
debugging project. They are given 1-5 source files and a brief description of the desired
behavior of the application. The source files have several bugs which must be fixed within the
remaining approximately 90 minutes in the class period. At the end of the period, students turn
in the fixed code, along with a log of their debugging activities. They are encouraged to use a
scientific method, forming and testing hypotheses. Several instructors were on hand to answer
questions, and would often interrupt students to give advice.
-
8/14/2019 Erik Pukinskis "Embodied Programming"
4/14
Generally, the students observed were novice programmers, this typically being only their
second programming course. In addition, each class period the students find new partners, so
each session is a chance to work with someone they've never worked with before.
Data was captured from 8 different pairs in 8 different sessions, each lasting 40-80 minutes.
Three video streams were captured. (Figure 1) Two were from digital camcorders, one with a
wide angle lens pointed at the programmers' bodies and faces from behind their monitor, and
another pointed at the monitor to capture interaction between participants hands and the screen.
The third video stream was a full resolution digital capture of the desktop activity, including
mouse activity. A directional microphone recorded high-quality audio, a necessity in a noisy
classroom.
Results were analyzed using micro-analysis. Two of the eight video sessions received full
analysis. Because of problems with the desktop capture, full analysis was not possible on the
other data. The analyiss process began with the author watching the videos and adding subtitles
as necessary when audio quality was too poor for easy listening. The videos were then watched
through completely, while creating an index of participants' goals, strategies, and the gestures
and speech that seemed strongly correlated to the identification and solution of debugging
problems. Once these indicies were created, key moments showing examples of embodiment
were analyzed in detail, frame by frame, to attempt to identify the function of embodiment.
In order to facilitate this kind of detailed analysis, a new video analysis tool, called 3stream
(Figure 1) was built to facilitate the watching of all three video streams simultaneously. 3Stream
allows analysis to loop over key segments, scrub back and forth over all three video streams at
once, and step forward and backward through the streams. In addition, facilities are provided to
adjust the synchronization of the streams.
Such a tool is necessary because the story of a given embodied activity is not fully present in any
one stream. Watching the desktop capture, won't allow you to understand what is being
gestured at with the mouse, because the analyst cannot tell what the programmers are saying.
-
8/14/2019 Erik Pukinskis "Embodied Programming"
5/14
Similarly, watching the video of the participants will not reveal what they are looking at, so it is
often impossible to know what they are talking about. Speech, gesture, mouse activity are tightly
coupled, as will be shown below, which is why frame by frame simultaneous analysis facilitated
by 3stream is so critical.
Results
While much of the observed activity involved very limited movement, often just eye movements
and scrolling, there were several instances of dramatic embodied activity. This paper will focus
on two kinds of activity: code enactment and pointing.
Code Enactment
Figure 2 shows a series of movements made by a programmer who was debugging a tetris game.
The programmer and his partner were attempting to debug a faulty loop in the program, and
there was some confusion about the matrix which was being operated on. The participant on the
right at several points asks whether the problem is that the array indexes appear in different
orders through the program, sometimes in the form [x][y], and sometimes in the form [y][x]. Up
until this point, their discussion never got past the suggestion that this might be a problem.
We can see at the beginning of this sequence, in Figure 2b, as the programmer says going
through the x position, he draws out a line on the table, creating a mapping between the x
position and that space on the table. He then returns his hand to the mouse and begins reading
code off of the screen: less than width... x plus plus.
He then gestures back and forth near the left side of the axis he drew with his hand while saying
cause it's within the border, suggesting he is still using the same mapping. (Figure 2c) Then,
he retraces with his hand formed into a claw shape the same line he traced in Figure 2b, although
drawing a somewhat shorter line. (Figure 2d) He makes a little hop, and then stretches his hand
-
8/14/2019 Erik Pukinskis "Embodied Programming"
6/14
out into a broad stance along the line, with his pinky at his far right side, and his pointer at his far
left. (Figure 2f)
Next he brings the middle finger of his opposite hand up next to his pointer (Figure 2g), and
while saying plus plus... plus plus... moves his pointer finger towards his pinky in two small
movements, timed with each plus plus (Figure 2h-i). At this point, the programmer is able to
state more confidently: So it starts with... probably starts off with zero, goes to width. (Figure
2j)
Discussion
The correlation between movement and speech seems to show clearly that the movements are
meaningfully connected to the participant's cognitive activity, but there is an open question as to
whether the programmer's body is doing cognitive work, or whether perhaps it is just a side
effect of his internal cognitive processes.
Certainly, I am not the first to suggest that such bodily activities are doing meaninful work. The
Glucksberg (1964) result reported on earlier, Hutchins' (2007) analysis of the shipman's use of
his hands, are preexisting examples. But a slightly deeper analysis is in order.
Hutchins uses "the enactments of external representations habitually performed by practitioners
who live and work in complex culturally constituted settings" to explain his navigator's "aha
moment" and I think we see something similar in the example presented here. It is plainly
obvious in any recording of programmers that they habitually read code. And it is shown here
that they sometimes act it out. These are cultural practices, and the events described above
constitute an example of these cultural practices coming together and resulting in the
programmer being able to make a claim about the boundary conditions of the loop. It is possible
that the enactment was not strictly necessary for the programmer to reach this conclusion, but the
fact is that he didn't reach the conclusion until afterhe had performed the enactment.
-
8/14/2019 Erik Pukinskis "Embodied Programming"
7/14
And certainly there appear to be constraints that the programmer leans on. The importance of
constraints in cognitive offloading is described well in Scaife and Rogers, 1996). Constraints
limit the amount of work that must be done inside the head by creating impossibilities outside the
head. Gibbs (2006) claims that a form of this happens in the body, suggesting that the body can
be used to create stable multimodal enactions.
In our case, the enactment is stable in many ways... the movement of his pointer is confined by
the limits of his tendons. He cannot move is further to the left than its initial position, and he
cannot move it further to the right than his pinky. This constraint neatly mirrors the constraint of
the edges of a fixed size array. He even goes to the length of bringing his middle finger on the
other hand next to his pointer to maintain the stability of this constraint.
Pointing
A second example of embodied activity which was very frequently observed was pointing. In
one particular example, participants were debugging a loop. The loop said, in pseudocode: "if
something AND NOT something else THEN return false". In the correct solution, however, the
loop reads: "if something OR something else THEN return true."
Figure 3 shows a few seconds of mousing activity coupled with speech from the pair. The
details of the dialog are not nearly as important here as the structure of the mousing and the way
it couples with speech. First, there are minimally four different kinds of points shown here:
mouse hovering, clicking, selecting, and scrolling. Second, these different kinds of points are
shown to be carefully co-timed with speech, suggesting real cognitive significance.
Discussion
Again, this is not the first work to suggest that such gestures are doing real cognitive work.
Casasanto (in press) provides experimental evidence that "motor programs, themselves, are the
-
8/14/2019 Erik Pukinskis "Embodied Programming"
8/14
active ingredient in the cognitive function of gesture." Carlson et al (2007) provide evidence
that the hands are externalizing cognitive work while being used to solve math problems.
So it seems entirely possible, but what is really happening here? There are two things that are
noteworthy about the kind of pointing observed in the present study. First, Iit appears to be
closely related to what Chuck Goodwin describes in his 2003 paper on pointing. He describes
the way we use various tools and body parts simultaneously on multiple semiotic fields to
accomplish pointing. In his example, someone uses a trowel pointed at a map, but combines that
gesture with a head nod towards a space nearby together to establish the target of the point. We
seem to be seeing something similar thing here, where the programmer is indicating both at the
code, and the output, using multiple semiotic fields to triangulate a target for pointing.
Second, it is common knowledge that people point at things with the mouse cursor, but what this
data reveals is that the picture is far more complex than just one kind of indicating. This user, in
the span of maybe two sentences uses at least four different kinds of points (clicking, hovering,
selecting (with double click and with drag), and scrolling.
The present data isn't rich enough to say what the different functions of these different modes
might be, but it is not hard to speculate. Clicking creates a sound which integrated tightly with
speech. Selection can indicate a range, where clicking indicated a single point. A hover gesture
can indicate motion and can create icons where a selection can only indicate a range. These
affordances are extremely varied.
Conclusion
We guessed that programming was embodied, and that has been born out. It seems likely that
these embodied activities are doing real cognitive work, and there are some unique properties to
human-computer embodiment in particular. As humans we opportunistically use whatever
-
8/14/2019 Erik Pukinskis "Embodied Programming"
9/14
-
8/14/2019 Erik Pukinskis "Embodied Programming"
10/14
Gibbs, Raymond W. 2006. Embodiment and cognitive science. New York: Cambridge
University Press.
Glucksberg, S. (1964). Functional fixedness: Problem solution as a function of
observingresponses. Psychonomic Science, 1: 117-118.
Goodwin, C. (2003) "Pointing as Situated Practice", in S. Kita (ed.) Pointing: Where Language,
Culture, and Cognition Meet, pp. 217241. Hillsdale, NJ: Lawrence Erlbaum Associates.
[link]
Hurley, Susan (1998) Consciousness in Action. Harvard University Press.
Hutchins, E. (1995) Cognition in the wild. MIT Press.
Hutchins, E. (2007) Enaction, imagination, and insight. In press) [link]
Ko, A. J. and Myers, B. A. (2005). A Framework and Methodology for Studying the Causes of
Software Errors in Programming Systems. Journal of Visual Languages and Computing,
16, 1-2, 41-84. [link]
Lakoff, George & Johnson, Mark (1980) Metaphors We Live By. Chicago: University of
Chicago Press.
Lakoff, G. & R. Nez. (2000). Where Mathematics Comes From: How the Embodied Mind
Brings Mathematics into Being. New York: Basic Books.
Noe, Alva (2004) Action in perception. MIT Press.
Scaife, M. & Rogers, Y. (1996), External cognition: How do graphical representations work?,
International Journal of Human-Computer Studies, vol. 45. pp. 185-213. [link]
Spivey, M. (2007). The Continuity of Mind. Oxford: Oxford University Press.
http://www.sscnet.ucla.edu/clic/cgoodwin/03pointing.pdfhttp://hci.ucsd.edu/234/260-W2008/readings/EnactInsight.pdfhttp://www.cs.cmu.edu/~ajko/papers/Ko2004SoftwareErrorsFramework.pdfhttp://citeseer.ist.psu.edu/rd/23057911,75932,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/790/ftp:zSzzSzftp.cogs.susx.ac.ukzSzpubzSzreportszSzcsrpzSzcsrp335.pdf/scaife96external.pdfhttp://www.sscnet.ucla.edu/clic/cgoodwin/03pointing.pdfhttp://hci.ucsd.edu/234/260-W2008/readings/EnactInsight.pdfhttp://www.cs.cmu.edu/~ajko/papers/Ko2004SoftwareErrorsFramework.pdfhttp://citeseer.ist.psu.edu/rd/23057911,75932,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/790/ftp:zSzzSzftp.cogs.susx.ac.ukzSzpubzSzreportszSzcsrpzSzcsrp335.pdf/scaife96external.pdf -
8/14/2019 Erik Pukinskis "Embodied Programming"
11/14
Figure 1: The 3stream video analysis tool. Multiple video streams can be watched
simultaneously, with scrubbing and single frame movement features, the ability to control audio
playback independently on each stream and a built in synchronization tool.
-
8/14/2019 Erik Pukinskis "Embodied Programming"
12/14
Figure 2: Hand movements of participant B while talking about for loop in isLineFull method.
a) so
22:58:20
b) going through the x position
22:59:26 23:00:22
uh so less than width x plus
plus Y should be ok. Y is ok
c) cause its within the border.
23:10:02 - 12:10:11
uh x
d)
23:12:13 23:13:10
e)
23:13:20
f)
23:14:22
returns false
g) so
23:15:14
h) plus plus
23:17:00
i) plus plus
23:17:06
equals zero.
j) So it starts off
23:20:20
probably starts of with zero, goes to
width.
-
8/14/2019 Erik Pukinskis "Embodied Programming"
13/14
Figure 3: Indicating with clicking, selecting, hovering, and scrolling.
A: so both of these are either true
or false so changing this
1.
wont really make a difference.
2.
B: no, this should be true
A: oh, this right here?
3. 4.
and then this should be an OR?
5.
B: let me think.
A: because this one doesnt really
matter cuz this is false, right?
6.
B: well it cant be both inside andempty
5:12 A: but if this is true and
7. 8.
this is false then it
9.
will stopall the way up here.
10. 11.
B: why
A: because its false false
12. 13.
-
8/14/2019 Erik Pukinskis "Embodied Programming"
14/14
Figure 4: Images taken at the same time of a programmer simultaneously pointing with his
hand by resting it on the display bezel next to a region of interest while moving the mouse
pointer over the same area. The typing carat is also close by.
Figure 5: Two instances (separated by many minutes) of programmers averting their eyes
away from the workspace while attempting to solve a puzzle. In both cases their partners
appear not to be attending to their gaze, suggesting the aversion of eyes aides thinking
rather than communication.