erik pukinskis "embodied programming"

8/14/2019 Erik Pukinskis "Embodied Programming"

1/14

Embodied ProgrammingErik Pukinskis

UC San Diego

Cognitive Science

Abstract

Past research into the nature of programming has focused on programmers as information

processors, but there is a recent move in Cognitive Science towards an embodied view of

cognition. This work presents results of a naturalistic observation of student pair-programmers

which show that programming is, in fact, an embodied activity. Analysis shows that the body isused to enact code, and to carry out complex pointing simultaneously in multiple semiotic fields,

and that programmers use different modes afforded by the mouse cursor and selection to perform

a wide variety of points.

Introduction

What is programming? We know that it is carried out primarily by highly educated or skilled

programmers and that it results in the software systems that run everything from the pacemakers

that keep our hearts beating to the 17 mile wide Large Hadron Collider. But how exactly does

programming happen? What are the cognitive systems at work in the construction of software?

Studies of programming have been conducted, but they have typically been thought of as an

information processing activity. Ko and Meyers (2005) developed a model of software errors

which breaks down software error production into four layers: specifications, programmer,

programming system and programmer. They call errors in the programmer layer cognitive

breakdowns which can be attributed to problems with knowledge, attention, and strategies. The

types of breakdowns they describe range from issues with ambiguous signs to programmers


2/14

choosing the wrong rule to apply or using a faulty model. This method of describing cognition

falls firmly in the information processing model, a rich model which certainly bears some fruit,

and Ko and Meyers arrive at many valuable insights about programming.

However, in recent years there has been in increasing interest within Cognitive Science in the

body, and Ko and Meyers ignore the body almost entirely. The basic embodiment hypothesis is

that the body plays a critical role in cognition, but there are as many formulations of the

embodiment hypothesis as there are researchers interested in the phenomenon. These range

from the notion that our language and cognition is grounded in embodied metaphors (Lakoff and

Johnson, 1980) to more radical beliefs that much of actual online cognition occurs outside the

head (Hutchins, 1995; 2007). Many other cognitive scientists (Gibbs, 2006; Noe, 2004; Hurley,

1998; Lakoff & Nunez, 2000) have pushed the notion that the body is an important part of the

cognitive apparatus.

In addition, Andy Clark (2007) has proposed that the embodiment hypothesis is particularly

applicable in Human-Computer Interaction, because of what he calls "radical embodiment."

Clark cites several new pieces of research which support this notion. Maravita and Iriki (2004)

performed a set of experiments recording from bimodal neurons that respond both to activity in a

somatosensory receptive field (sRF) and to a visual receptive field (vRF) adjacent to the sRF,

which remain anchored to the body as it moves. Trials showed two distinct types of bimodal

cells, both which show an expansion of the vRF following tool use. The vRF of the "distal" type

initially included only the hand, but after learning to use a rake in a reaching task expanded to

include the length of the tool. The "proximal" type cells responded initially to the area within

reach of the body, and expanded to include the area within reach of the tool. Further

experiments involved training in a virtual reality situation, where the monkeys were prevented

from viewing their hand or the rake directly, instead viewing them via a video monitor. In this

situation the vRF shifted to include the on-screen representations of the hand and the tool. The

explanation Maravita and Iriki offer is that the body schema is being reused for the tool and tool

activity.


3/14

In addition, there is quite a bit of research which supports the idea that the body is playing a

critical role in cognitive activities. Hutchins (2007) describes--by way of Spivey (2007)--a

finding from Glucksberg (1964). Glucksberg's study required participants find a clever solution

for mounting a candle on a wall with a box of tacks. What is noteworthy about this study is that

it was found that participants often were observed to be handling the box immediately before

they realized that the solution was to use the box itself to hang the candle. This suggests that the

physical handling of the box facilitated the insight. And Hutchins himself (2007) describes a

situation in which a navy shipman's hands are doing cognitive work that allows him to discover a

missing term in a calculation.

Indeed, the notion that cognition is offloaded onto external representations is not new (Scaife and

Rogers, 1996; Hutchins, 1995), and there is no reason why the body could not be used for such a

purpose. Indeed, it seems likely that the body is recruited in programming, the big question is

how?

Method

The study presented here utilizes naturalistic observation techniques borrowed from the practice

of cognitive ethnography. A programming class at UC San Diego was identified as a potential

source of data. The class is specifically focused on debugging. Students are assigned readings

from a software development text each week. In class they take a quiz on the readings and are

given a short lecture. After the lecture, they form pairs and are given a small, self-contained

debugging project. They are given 1-5 source files and a brief description of the desired

behavior of the application. The source files have several bugs which must be fixed within the

remaining approximately 90 minutes in the class period. At the end of the period, students turn

in the fixed code, along with a log of their debugging activities. They are encouraged to use a

scientific method, forming and testing hypotheses. Several instructors were on hand to answer

questions, and would often interrupt students to give advice.


4/14

Generally, the students observed were novice programmers, this typically being only their

second programming course. In addition, each class period the students find new partners, so

each session is a chance to work with someone they've never worked with before.

Data was captured from 8 different pairs in 8 different sessions, each lasting 40-80 minutes.

Three video streams were captured. (Figure 1) Two were from digital camcorders, one with a

wide angle lens pointed at the programmers' bodies and faces from behind their monitor, and

another pointed at the monitor to capture interaction between participants hands and the screen.

The third video stream was a full resolution digital capture of the desktop activity, including

mouse activity. A directional microphone recorded high-quality audio, a necessity in a noisy

classroom.

Results were analyzed using micro-analysis. Two of the eight video sessions received full

analysis. Because of problems with the desktop capture, full analysis was not possible on the

other data. The analyiss process began with the author watching the videos and adding subtitles

as necessary when audio quality was too poor for easy listening. The videos were then watched

through completely, while creating an index of participants' goals, strategies, and the gestures

and speech that seemed strongly correlated to the identification and solution of debugging

problems. Once these indicies were created, key moments showing examples of embodiment

were analyzed in detail, frame by frame, to attempt to identify the function of embodiment.

In order to facilitate this kind of detailed analysis, a new video analysis tool, called 3stream

(Figure 1) was built to facilitate the watching of all three video streams simultaneously. 3Stream

allows analysis to loop over key segments, scrub back and forth over all three video streams at

once, and step forward and backward through the streams. In addition, facilities are provided to

adjust the synchronization of the streams.

Such a tool is necessary because the story of a given embodied activity is not fully present in any

one stream. Watching the desktop capture, won't allow you to understand what is being

gestured at with the mouse, because the analyst cannot tell what the programmers are saying.


5/14

Similarly, watching the video of the participants will not reveal what they are looking at, so it is

often impossible to know what they are talking about. Speech, gesture, mouse activity are tightly

coupled, as will be shown below, which is why frame by frame simultaneous analysis facilitated

by 3stream is so critical.

Results

While much of the observed activity involved very limited movement, often just eye movements

and scrolling, there were several instances of dramatic embodied activity. This paper will focus

on two kinds of activity: code enactment and pointing.

Code Enactment

Figure 2 shows a series of movements made by a programmer who was debugging a tetris game.

The programmer and his partner were attempting to debug a faulty loop in the program, and

there was some confusion about the matrix which was being operated on. The participant on the

right at several points asks whether the problem is that the array indexes appear in different

orders through the program, sometimes in the form [x][y], and sometimes in the form [y][x]. Up

until this point, their discussion never got past the suggestion that this might be a problem.

We can see at the beginning of this sequence, in Figure 2b, as the programmer says going

through the x position, he draws out a line on the table, creating a mapping between the x

position and that space on the table. He then returns his hand to the mouse and begins reading

code off of the screen: less than width... x plus plus.

He then gestures back and forth near the left side of the axis he drew with his hand while saying

cause it's within the border, suggesting he is still using the same mapping. (Figure 2c) Then,

he retraces with his hand formed into a claw shape the same line he traced in Figure 2b, although

drawing a somewhat shorter line. (Figure 2d) He makes a little hop, and then stretches his hand


6/14

out into a broad stance along the line, with his pinky at his far right side, and his pointer at his far

left. (Figure 2f)

Next he brings the middle finger of his opposite hand up next to his pointer (Figure 2g), and

while saying plus plus... plus plus... moves his pointer finger towards his pinky in two small

movements, timed with each plus plus (Figure 2h-i). At this point, the programmer is able to

state more confidently: So it starts with... probably starts off with zero, goes to width. (Figure

2j)

Discussion

The correlation between movement and speech seems to show clearly that the movements are

meaningfully connected to the participant's cognitive activity, but there is an open question as to

whether the programmer's body is doing cognitive work, or whether perhaps it is just a side

effect of his internal cognitive processes.

Certainly, I am not the first to suggest that such bodily activities are doing meaninful work. The

Glucksberg (1964) result reported on earlier, Hutchins' (2007) analysis of the shipman's use of

his hands, are preexisting examples. But a slightly deeper analysis is in order.

Hutchins uses "the enactments of external representations habitually performed by practitioners

who live and work in complex culturally constituted settings" to explain his navigator's "aha

moment" and I think we see something similar in the example presented here. It is plainly

obvious in any recording of programmers that they habitually read code. And it is shown here

that they sometimes act it out. These are cultural practices, and the events described above

constitute an example of these cultural practices coming together and resulting in the

programmer being able to make a claim about the boundary conditions of the loop. It is possible

that the enactment was not strictly necessary for the programmer to reach this conclusion, but the

fact is that he didn't reach the conclusion until afterhe had performed the enactment.


7/14

And certainly there appear to be constraints that the programmer leans on. The importance of

constraints in cognitive offloading is described well in Scaife and Rogers, 1996). Constraints

limit the amount of work that must be done inside the head by creating impossibilities outside the

head. Gibbs (2006) claims that a form of this happens in the body, suggesting that the body can

be used to create stable multimodal enactions.

In our case, the enactment is stable in many ways... the movement of his pointer is confined by

the limits of his tendons. He cannot move is further to the left than its initial position, and he

cannot move it further to the right than his pinky. This constraint neatly mirrors the constraint of

the edges of a fixed size array. He even goes to the length of bringing his middle finger on the

other hand next to his pointer to maintain the stability of this constraint.

Pointing

A second example of embodied activity which was very frequently observed was pointing. In

one particular example, participants were debugging a loop. The loop said, in pseudocode: "if

something AND NOT something else THEN return false". In the correct solution, however, the

loop reads: "if something OR something else THEN return true."

Figure 3 shows a few seconds of mousing activity coupled with speech from the pair. The

details of the dialog are not nearly as important here as the structure of the mousing and the way

it couples with speech. First, there are minimally four different kinds of points shown here:

mouse hovering, clicking, selecting, and scrolling. Second, these different kinds of points are

shown to be carefully co-timed with speech, suggesting real cognitive significance.

Discussion

Again, this is not the first work to suggest that such gestures are doing real cognitive work.

Casasanto (in press) provides experimental evidence that "motor programs, themselves, are the


8/14

active ingredient in the cognitive function of gesture." Carlson et al (2007) provide evidence

that the hands are externalizing cognitive work while being used to solve math problems.

So it seems entirely possible, but what is really happening here? There are two things that are

noteworthy about the kind of pointing observed in the present study. First, Iit appears to be

closely related to what Chuck Goodwin describes in his 2003 paper on pointing. He describes

the way we use various tools and body parts simultaneously on multiple semiotic fields to

accomplish pointing. In his example, someone uses a trowel pointed at a map, but combines that

gesture with a head nod towards a space nearby together to establish the target of the point. We

seem to be seeing something similar thing here, where the programmer is indicating both at the

code, and the output, using multiple semiotic fields to triangulate a target for pointing.

Second, it is common knowledge that people point at things with the mouse cursor, but what this

data reveals is that the picture is far more complex than just one kind of indicating. This user, in

the span of maybe two sentences uses at least four different kinds of points (clicking, hovering,

selecting (with double click and with drag), and scrolling.

The present data isn't rich enough to say what the different functions of these different modes

might be, but it is not hard to speculate. Clicking creates a sound which integrated tightly with

speech. Selection can indicate a range, where clicking indicated a single point. A hover gesture

can indicate motion and can create icons where a selection can only indicate a range. These

affordances are extremely varied.

Conclusion

We guessed that programming was embodied, and that has been born out. It seems likely that

these embodied activities are doing real cognitive work, and there are some unique properties to

human-computer embodiment in particular. As humans we opportunistically use whatever


9/14


10/14

Gibbs, Raymond W. 2006. Embodiment and cognitive science. New York: Cambridge

University Press.

Glucksberg, S. (1964). Functional fixedness: Problem solution as a function of

observingresponses. Psychonomic Science, 1: 117-118.

Goodwin, C. (2003) "Pointing as Situated Practice", in S. Kita (ed.) Pointing: Where Language,

Culture, and Cognition Meet, pp. 217241. Hillsdale, NJ: Lawrence Erlbaum Associates.

[link]

Hurley, Susan (1998) Consciousness in Action. Harvard University Press.

Hutchins, E. (1995) Cognition in the wild. MIT Press.

Hutchins, E. (2007) Enaction, imagination, and insight. In press) [link]

Ko, A. J. and Myers, B. A. (2005). A Framework and Methodology for Studying the Causes of

Software Errors in Programming Systems. Journal of Visual Languages and Computing,

16, 1-2, 41-84. [link]

Lakoff, George & Johnson, Mark (1980) Metaphors We Live By. Chicago: University of

Chicago Press.

Lakoff, G. & R. Nez. (2000). Where Mathematics Comes From: How the Embodied Mind

Brings Mathematics into Being. New York: Basic Books.

Noe, Alva (2004) Action in perception. MIT Press.

Scaife, M. & Rogers, Y. (1996), External cognition: How do graphical representations work?,

International Journal of Human-Computer Studies, vol. 45. pp. 185-213. [link]

Spivey, M. (2007). The Continuity of Mind. Oxford: Oxford University Press.
http://www.sscnet.ucla.edu/clic/cgoodwin/03pointing.pdfhttp://hci.ucsd.edu/234/260-W2008/readings/EnactInsight.pdfhttp://www.cs.cmu.edu/~ajko/papers/Ko2004SoftwareErrorsFramework.pdfhttp://citeseer.ist.psu.edu/rd/23057911,75932,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/790/ftp:zSzzSzftp.cogs.susx.ac.ukzSzpubzSzreportszSzcsrpzSzcsrp335.pdf/scaife96external.pdfhttp://www.sscnet.ucla.edu/clic/cgoodwin/03pointing.pdfhttp://hci.ucsd.edu/234/260-W2008/readings/EnactInsight.pdfhttp://www.cs.cmu.edu/~ajko/papers/Ko2004SoftwareErrorsFramework.pdfhttp://citeseer.ist.psu.edu/rd/23057911,75932,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/790/ftp:zSzzSzftp.cogs.susx.ac.ukzSzpubzSzreportszSzcsrpzSzcsrp335.pdf/scaife96external.pdf


11/14

Figure 1: The 3stream video analysis tool. Multiple video streams can be watched

simultaneously, with scrubbing and single frame movement features, the ability to control audio

playback independently on each stream and a built in synchronization tool.


12/14

Figure 2: Hand movements of participant B while talking about for loop in isLineFull method.

a) so

22:58:20

b) going through the x position

22:59:26 23:00:22

uh so less than width x plus

plus Y should be ok. Y is ok

c) cause its within the border.

23:10:02 - 12:10:11

uh x

d)

23:12:13 23:13:10

e)

23:13:20

f)

23:14:22

returns false

g) so

23:15:14

h) plus plus

23:17:00

i) plus plus

23:17:06

equals zero.

j) So it starts off

23:20:20

probably starts of with zero, goes to

width.


13/14

Figure 3: Indicating with clicking, selecting, hovering, and scrolling.

A: so both of these are either true

or false so changing this

1.

wont really make a difference.

2.

B: no, this should be true

A: oh, this right here?

3. 4.

and then this should be an OR?

5.

B: let me think.

A: because this one doesnt really

matter cuz this is false, right?

6.

B: well it cant be both inside andempty

5:12 A: but if this is true and

7. 8.

this is false then it

9.

will stopall the way up here.

10. 11.

B: why

A: because its false false

12. 13.


14/14

Figure 4: Images taken at the same time of a programmer simultaneously pointing with his

hand by resting it on the display bezel next to a region of interest while moving the mouse

pointer over the same area. The typing carat is also close by.

Figure 5: Two instances (separated by many minutes) of programmers averting their eyes

away from the workspace while attempting to solve a puzzle. In both cases their partners

appear not to be attending to their gaze, suggesting the aversion of eyes aides thinking

rather than communication.

erik pukinskis "embodied programming"

Documents