erik pukinskis "embodied programming"

Upload: erikpukinskis

Post on 30-May-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    1/14

    Embodied ProgrammingErik Pukinskis

    UC San Diego

    Cognitive Science

    Abstract

    Past research into the nature of programming has focused on programmers as information

    processors, but there is a recent move in Cognitive Science towards an embodied view of

    cognition. This work presents results of a naturalistic observation of student pair-programmers

    which show that programming is, in fact, an embodied activity. Analysis shows that the body isused to enact code, and to carry out complex pointing simultaneously in multiple semiotic fields,

    and that programmers use different modes afforded by the mouse cursor and selection to perform

    a wide variety of points.

    Introduction

    What is programming? We know that it is carried out primarily by highly educated or skilled

    programmers and that it results in the software systems that run everything from the pacemakers

    that keep our hearts beating to the 17 mile wide Large Hadron Collider. But how exactly does

    programming happen? What are the cognitive systems at work in the construction of software?

    Studies of programming have been conducted, but they have typically been thought of as an

    information processing activity. Ko and Meyers (2005) developed a model of software errors

    which breaks down software error production into four layers: specifications, programmer,

    programming system and programmer. They call errors in the programmer layer cognitive

    breakdowns which can be attributed to problems with knowledge, attention, and strategies. The

    types of breakdowns they describe range from issues with ambiguous signs to programmers

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    2/14

    choosing the wrong rule to apply or using a faulty model. This method of describing cognition

    falls firmly in the information processing model, a rich model which certainly bears some fruit,

    and Ko and Meyers arrive at many valuable insights about programming.

    However, in recent years there has been in increasing interest within Cognitive Science in the

    body, and Ko and Meyers ignore the body almost entirely. The basic embodiment hypothesis is

    that the body plays a critical role in cognition, but there are as many formulations of the

    embodiment hypothesis as there are researchers interested in the phenomenon. These range

    from the notion that our language and cognition is grounded in embodied metaphors (Lakoff and

    Johnson, 1980) to more radical beliefs that much of actual online cognition occurs outside the

    head (Hutchins, 1995; 2007). Many other cognitive scientists (Gibbs, 2006; Noe, 2004; Hurley,

    1998; Lakoff & Nunez, 2000) have pushed the notion that the body is an important part of the

    cognitive apparatus.

    In addition, Andy Clark (2007) has proposed that the embodiment hypothesis is particularly

    applicable in Human-Computer Interaction, because of what he calls "radical embodiment."

    Clark cites several new pieces of research which support this notion. Maravita and Iriki (2004)

    performed a set of experiments recording from bimodal neurons that respond both to activity in a

    somatosensory receptive field (sRF) and to a visual receptive field (vRF) adjacent to the sRF,

    which remain anchored to the body as it moves. Trials showed two distinct types of bimodal

    cells, both which show an expansion of the vRF following tool use. The vRF of the "distal" type

    initially included only the hand, but after learning to use a rake in a reaching task expanded to

    include the length of the tool. The "proximal" type cells responded initially to the area within

    reach of the body, and expanded to include the area within reach of the tool. Further

    experiments involved training in a virtual reality situation, where the monkeys were prevented

    from viewing their hand or the rake directly, instead viewing them via a video monitor. In this

    situation the vRF shifted to include the on-screen representations of the hand and the tool. The

    explanation Maravita and Iriki offer is that the body schema is being reused for the tool and tool

    activity.

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    3/14

    In addition, there is quite a bit of research which supports the idea that the body is playing a

    critical role in cognitive activities. Hutchins (2007) describes--by way of Spivey (2007)--a

    finding from Glucksberg (1964). Glucksberg's study required participants find a clever solution

    for mounting a candle on a wall with a box of tacks. What is noteworthy about this study is that

    it was found that participants often were observed to be handling the box immediately before

    they realized that the solution was to use the box itself to hang the candle. This suggests that the

    physical handling of the box facilitated the insight. And Hutchins himself (2007) describes a

    situation in which a navy shipman's hands are doing cognitive work that allows him to discover a

    missing term in a calculation.

    Indeed, the notion that cognition is offloaded onto external representations is not new (Scaife and

    Rogers, 1996; Hutchins, 1995), and there is no reason why the body could not be used for such a

    purpose. Indeed, it seems likely that the body is recruited in programming, the big question is

    how?

    Method

    The study presented here utilizes naturalistic observation techniques borrowed from the practice

    of cognitive ethnography. A programming class at UC San Diego was identified as a potential

    source of data. The class is specifically focused on debugging. Students are assigned readings

    from a software development text each week. In class they take a quiz on the readings and are

    given a short lecture. After the lecture, they form pairs and are given a small, self-contained

    debugging project. They are given 1-5 source files and a brief description of the desired

    behavior of the application. The source files have several bugs which must be fixed within the

    remaining approximately 90 minutes in the class period. At the end of the period, students turn

    in the fixed code, along with a log of their debugging activities. They are encouraged to use a

    scientific method, forming and testing hypotheses. Several instructors were on hand to answer

    questions, and would often interrupt students to give advice.

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    4/14

    Generally, the students observed were novice programmers, this typically being only their

    second programming course. In addition, each class period the students find new partners, so

    each session is a chance to work with someone they've never worked with before.

    Data was captured from 8 different pairs in 8 different sessions, each lasting 40-80 minutes.

    Three video streams were captured. (Figure 1) Two were from digital camcorders, one with a

    wide angle lens pointed at the programmers' bodies and faces from behind their monitor, and

    another pointed at the monitor to capture interaction between participants hands and the screen.

    The third video stream was a full resolution digital capture of the desktop activity, including

    mouse activity. A directional microphone recorded high-quality audio, a necessity in a noisy

    classroom.

    Results were analyzed using micro-analysis. Two of the eight video sessions received full

    analysis. Because of problems with the desktop capture, full analysis was not possible on the

    other data. The analyiss process began with the author watching the videos and adding subtitles

    as necessary when audio quality was too poor for easy listening. The videos were then watched

    through completely, while creating an index of participants' goals, strategies, and the gestures

    and speech that seemed strongly correlated to the identification and solution of debugging

    problems. Once these indicies were created, key moments showing examples of embodiment

    were analyzed in detail, frame by frame, to attempt to identify the function of embodiment.

    In order to facilitate this kind of detailed analysis, a new video analysis tool, called 3stream

    (Figure 1) was built to facilitate the watching of all three video streams simultaneously. 3Stream

    allows analysis to loop over key segments, scrub back and forth over all three video streams at

    once, and step forward and backward through the streams. In addition, facilities are provided to

    adjust the synchronization of the streams.

    Such a tool is necessary because the story of a given embodied activity is not fully present in any

    one stream. Watching the desktop capture, won't allow you to understand what is being

    gestured at with the mouse, because the analyst cannot tell what the programmers are saying.

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    5/14

    Similarly, watching the video of the participants will not reveal what they are looking at, so it is

    often impossible to know what they are talking about. Speech, gesture, mouse activity are tightly

    coupled, as will be shown below, which is why frame by frame simultaneous analysis facilitated

    by 3stream is so critical.

    Results

    While much of the observed activity involved very limited movement, often just eye movements

    and scrolling, there were several instances of dramatic embodied activity. This paper will focus

    on two kinds of activity: code enactment and pointing.

    Code Enactment

    Figure 2 shows a series of movements made by a programmer who was debugging a tetris game.

    The programmer and his partner were attempting to debug a faulty loop in the program, and

    there was some confusion about the matrix which was being operated on. The participant on the

    right at several points asks whether the problem is that the array indexes appear in different

    orders through the program, sometimes in the form [x][y], and sometimes in the form [y][x]. Up

    until this point, their discussion never got past the suggestion that this might be a problem.

    We can see at the beginning of this sequence, in Figure 2b, as the programmer says going

    through the x position, he draws out a line on the table, creating a mapping between the x

    position and that space on the table. He then returns his hand to the mouse and begins reading

    code off of the screen: less than width... x plus plus.

    He then gestures back and forth near the left side of the axis he drew with his hand while saying

    cause it's within the border, suggesting he is still using the same mapping. (Figure 2c) Then,

    he retraces with his hand formed into a claw shape the same line he traced in Figure 2b, although

    drawing a somewhat shorter line. (Figure 2d) He makes a little hop, and then stretches his hand

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    6/14

    out into a broad stance along the line, with his pinky at his far right side, and his pointer at his far

    left. (Figure 2f)

    Next he brings the middle finger of his opposite hand up next to his pointer (Figure 2g), and

    while saying plus plus... plus plus... moves his pointer finger towards his pinky in two small

    movements, timed with each plus plus (Figure 2h-i). At this point, the programmer is able to

    state more confidently: So it starts with... probably starts off with zero, goes to width. (Figure

    2j)

    Discussion

    The correlation between movement and speech seems to show clearly that the movements are

    meaningfully connected to the participant's cognitive activity, but there is an open question as to

    whether the programmer's body is doing cognitive work, or whether perhaps it is just a side

    effect of his internal cognitive processes.

    Certainly, I am not the first to suggest that such bodily activities are doing meaninful work. The

    Glucksberg (1964) result reported on earlier, Hutchins' (2007) analysis of the shipman's use of

    his hands, are preexisting examples. But a slightly deeper analysis is in order.

    Hutchins uses "the enactments of external representations habitually performed by practitioners

    who live and work in complex culturally constituted settings" to explain his navigator's "aha

    moment" and I think we see something similar in the example presented here. It is plainly

    obvious in any recording of programmers that they habitually read code. And it is shown here

    that they sometimes act it out. These are cultural practices, and the events described above

    constitute an example of these cultural practices coming together and resulting in the

    programmer being able to make a claim about the boundary conditions of the loop. It is possible

    that the enactment was not strictly necessary for the programmer to reach this conclusion, but the

    fact is that he didn't reach the conclusion until afterhe had performed the enactment.

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    7/14

    And certainly there appear to be constraints that the programmer leans on. The importance of

    constraints in cognitive offloading is described well in Scaife and Rogers, 1996). Constraints

    limit the amount of work that must be done inside the head by creating impossibilities outside the

    head. Gibbs (2006) claims that a form of this happens in the body, suggesting that the body can

    be used to create stable multimodal enactions.

    In our case, the enactment is stable in many ways... the movement of his pointer is confined by

    the limits of his tendons. He cannot move is further to the left than its initial position, and he

    cannot move it further to the right than his pinky. This constraint neatly mirrors the constraint of

    the edges of a fixed size array. He even goes to the length of bringing his middle finger on the

    other hand next to his pointer to maintain the stability of this constraint.

    Pointing

    A second example of embodied activity which was very frequently observed was pointing. In

    one particular example, participants were debugging a loop. The loop said, in pseudocode: "if

    something AND NOT something else THEN return false". In the correct solution, however, the

    loop reads: "if something OR something else THEN return true."

    Figure 3 shows a few seconds of mousing activity coupled with speech from the pair. The

    details of the dialog are not nearly as important here as the structure of the mousing and the way

    it couples with speech. First, there are minimally four different kinds of points shown here:

    mouse hovering, clicking, selecting, and scrolling. Second, these different kinds of points are

    shown to be carefully co-timed with speech, suggesting real cognitive significance.

    Discussion

    Again, this is not the first work to suggest that such gestures are doing real cognitive work.

    Casasanto (in press) provides experimental evidence that "motor programs, themselves, are the

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    8/14

    active ingredient in the cognitive function of gesture." Carlson et al (2007) provide evidence

    that the hands are externalizing cognitive work while being used to solve math problems.

    So it seems entirely possible, but what is really happening here? There are two things that are

    noteworthy about the kind of pointing observed in the present study. First, Iit appears to be

    closely related to what Chuck Goodwin describes in his 2003 paper on pointing. He describes

    the way we use various tools and body parts simultaneously on multiple semiotic fields to

    accomplish pointing. In his example, someone uses a trowel pointed at a map, but combines that

    gesture with a head nod towards a space nearby together to establish the target of the point. We

    seem to be seeing something similar thing here, where the programmer is indicating both at the

    code, and the output, using multiple semiotic fields to triangulate a target for pointing.

    Second, it is common knowledge that people point at things with the mouse cursor, but what this

    data reveals is that the picture is far more complex than just one kind of indicating. This user, in

    the span of maybe two sentences uses at least four different kinds of points (clicking, hovering,

    selecting (with double click and with drag), and scrolling.

    The present data isn't rich enough to say what the different functions of these different modes

    might be, but it is not hard to speculate. Clicking creates a sound which integrated tightly with

    speech. Selection can indicate a range, where clicking indicated a single point. A hover gesture

    can indicate motion and can create icons where a selection can only indicate a range. These

    affordances are extremely varied.

    Conclusion

    We guessed that programming was embodied, and that has been born out. It seems likely that

    these embodied activities are doing real cognitive work, and there are some unique properties to

    human-computer embodiment in particular. As humans we opportunistically use whatever

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    9/14

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    10/14

    Gibbs, Raymond W. 2006. Embodiment and cognitive science. New York: Cambridge

    University Press.

    Glucksberg, S. (1964). Functional fixedness: Problem solution as a function of

    observingresponses. Psychonomic Science, 1: 117-118.

    Goodwin, C. (2003) "Pointing as Situated Practice", in S. Kita (ed.) Pointing: Where Language,

    Culture, and Cognition Meet, pp. 217241. Hillsdale, NJ: Lawrence Erlbaum Associates.

    [link]

    Hurley, Susan (1998) Consciousness in Action. Harvard University Press.

    Hutchins, E. (1995) Cognition in the wild. MIT Press.

    Hutchins, E. (2007) Enaction, imagination, and insight. In press) [link]

    Ko, A. J. and Myers, B. A. (2005). A Framework and Methodology for Studying the Causes of

    Software Errors in Programming Systems. Journal of Visual Languages and Computing,

    16, 1-2, 41-84. [link]

    Lakoff, George & Johnson, Mark (1980) Metaphors We Live By. Chicago: University of

    Chicago Press.

    Lakoff, G. & R. Nez. (2000). Where Mathematics Comes From: How the Embodied Mind

    Brings Mathematics into Being. New York: Basic Books.

    Noe, Alva (2004) Action in perception. MIT Press.

    Scaife, M. & Rogers, Y. (1996), External cognition: How do graphical representations work?,

    International Journal of Human-Computer Studies, vol. 45. pp. 185-213. [link]

    Spivey, M. (2007). The Continuity of Mind. Oxford: Oxford University Press.

    http://www.sscnet.ucla.edu/clic/cgoodwin/03pointing.pdfhttp://hci.ucsd.edu/234/260-W2008/readings/EnactInsight.pdfhttp://www.cs.cmu.edu/~ajko/papers/Ko2004SoftwareErrorsFramework.pdfhttp://citeseer.ist.psu.edu/rd/23057911,75932,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/790/ftp:zSzzSzftp.cogs.susx.ac.ukzSzpubzSzreportszSzcsrpzSzcsrp335.pdf/scaife96external.pdfhttp://www.sscnet.ucla.edu/clic/cgoodwin/03pointing.pdfhttp://hci.ucsd.edu/234/260-W2008/readings/EnactInsight.pdfhttp://www.cs.cmu.edu/~ajko/papers/Ko2004SoftwareErrorsFramework.pdfhttp://citeseer.ist.psu.edu/rd/23057911,75932,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/790/ftp:zSzzSzftp.cogs.susx.ac.ukzSzpubzSzreportszSzcsrpzSzcsrp335.pdf/scaife96external.pdf
  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    11/14

    Figure 1: The 3stream video analysis tool. Multiple video streams can be watched

    simultaneously, with scrubbing and single frame movement features, the ability to control audio

    playback independently on each stream and a built in synchronization tool.

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    12/14

    Figure 2: Hand movements of participant B while talking about for loop in isLineFull method.

    a) so

    22:58:20

    b) going through the x position

    22:59:26 23:00:22

    uh so less than width x plus

    plus Y should be ok. Y is ok

    c) cause its within the border.

    23:10:02 - 12:10:11

    uh x

    d)

    23:12:13 23:13:10

    e)

    23:13:20

    f)

    23:14:22

    returns false

    g) so

    23:15:14

    h) plus plus

    23:17:00

    i) plus plus

    23:17:06

    equals zero.

    j) So it starts off

    23:20:20

    probably starts of with zero, goes to

    width.

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    13/14

    Figure 3: Indicating with clicking, selecting, hovering, and scrolling.

    A: so both of these are either true

    or false so changing this

    1.

    wont really make a difference.

    2.

    B: no, this should be true

    A: oh, this right here?

    3. 4.

    and then this should be an OR?

    5.

    B: let me think.

    A: because this one doesnt really

    matter cuz this is false, right?

    6.

    B: well it cant be both inside andempty

    5:12 A: but if this is true and

    7. 8.

    this is false then it

    9.

    will stopall the way up here.

    10. 11.

    B: why

    A: because its false false

    12. 13.

  • 8/14/2019 Erik Pukinskis "Embodied Programming"

    14/14

    Figure 4: Images taken at the same time of a programmer simultaneously pointing with his

    hand by resting it on the display bezel next to a region of interest while moving the mouse

    pointer over the same area. The typing carat is also close by.

    Figure 5: Two instances (separated by many minutes) of programmers averting their eyes

    away from the workspace while attempting to solve a puzzle. In both cases their partners

    appear not to be attending to their gaze, suggesting the aversion of eyes aides thinking

    rather than communication.