collocating interface objects : jon may tim gamble contact ... · collocating interface objects 2...

Collocating interface objects :

Zooming into Maps and Databases

Jon May

University of Plymouth

Tim Gamble

University of Sheffield

Contact Author: Professor Jon May

School of Psychology

University of Plymouth

Drake Circus

PL4 8AA

[email protected]

Running Head: Collocating interface objects

Collocating interface objects

2

Abstract

May, Dean and Barnard (2003) used a theoretically based model of visual human-

computer interaction to argue that interface objects should be collocated following screen

changes such as a zoom-in to detail. Existing online maps do not follow this principle, but

uniformly move a clicked point to the centre of the subsequent display, leaving the user

looking at an unrelated location. This paper presents three experiments showing that

collocating the point clicked so that the detailed location appears in the place previously

occupied by the overview location makes the map easier to use, reflected in a reduction in

eye movements and interaction duration. Two further experiments show that a list-like

database interface which benefits from the persistence of contextual information does not

show the same benefit of collocation, supporting the claim that the nature of the task must

be taken into account in choosing how to design dynamic displays. We discuss the benefit

of basing design principles on theoretical models so that they can be extended to novel

situations.

(168 words)

Acknowledgement

This work was supported by the Engineering and Physical Sciences Research Council

[EP/C515528/2]


3

Collocating interface objects : Zooming into Maps and Databases

Interactive maps are everywhere nowadays, and map use has become routine.

Ramblers and drivers no longer have to battle with unwieldy sheets of paper, compromising

breadth of representation against level of detail, with the crucial junction lost in the fold of

the map, and the new road not even shown. Up to date cartographical information is now

instantly accessible on computers and in cars, even on handheld devices that are wirelessly

connected to the internet, accurately positioned by GPS. In principle, relevant objects or

businesses can be highlighted and irrelevant detail removed from the view, and the maps

can be drawn at whatever level of detail the user requires for their current goal, zooming in

from a low detail overview to successively more detailed local views of an area.

Such flexibility comes at a cost, however, because the representation on the device

cannot necessarily be the same as that on a paper map. A paper map is a designed

representation of reality. It has been constructed by a skilled practitioner to take advantage

of the map readers’ attentional and perceptual processing, to allow them to selectively

attend to certain classes of information in alternation. Enormously successful, cartographic

conventions have not developed overnight, and their evolution has relied upon

developments in printing technology as well as the skill of the cartographer. In moving

from a static, paper to dynamic, electronic representations, with far smaller screens and

lower resolutions than are achievable in print, cartography faces a challenge. Allowing

classes of information to be hidden or revealed adds interactional overheads to the design

and use of a map that are absent when the map design takes advantage of selective

attention. Changing between different scales of map as the user zooms in and out requires

design decisions to be made about how successive views should be related. Designing an

interactive map becomes a problem in human-computer interaction, prone to the same


4

conflicting demands as other computer interface designs. Technological change makes

trade-offs between speed, capacity and mode of delivery difficult to anticipate. Fashions in

implementation flourish and designers imitate more often than they innovate.

One thing that does not change, however, is the mental architecture of the map

reader. We have previously argued that computer interface design should be a science as

much as a craft, and should benefit by a principled understanding of the tasks users

undertake, in order to capitalise upon the mental resources available for interaction. Early

in the development of computer based maps, Moellering (1975) noted that ‘when one

begins to work with interactive display systems and particularly animations … displays

must be considered in a dynamic setting.’ Some researchers have since advocated that

interface designers learn from the dynamics of film editing, and should manage changes in

their interfaces in the same way that film editors manipulated temporal and spatial jumps in

films (e.g., Young and Clanton, 1993). In May and Barnard (1995), we argued that rather

than naively imitate cinematographic editing conventions, designers needed to understand

why these conventions worked for films, and to create parallel changes in their interfaces

based upon that understanding. Film editing had evolved as a craft skill, and its rules of

thumb were situated within the domain of camera angles, lighting, and narrative, as sets of

‘dos and don’ts’, with no real account of why certain practices worked and were ‘filmic’,

while others were ‘unfilmic’. While the work of researchers such as Kraft (1986, 1987) had

shown that conventions in film cutting did have a sound psychological basis, their domain

specific nature made it hard for designers to know what it was they were to apply to their

interfaces.

May and Barnard (1995) presented a theoretical analysis of cinematography based

upon Barnard’s (1985) Interacting Cognitive Subsystems model of cognition, and made


5

several extrapolations to interface design, including one scenario of a tourist information

system combining large-scale and detailed views of a locale to help people find places of

interest and plan routes (originally discussed in Barnard, May & Green, 1991). The

problem posed was how to combine these two representations. In contrast to ingenious and

novel combinations of fish-eye views, magnifying lenses and multiple simultaneous

displays, we argued that to be like a film the map interface should cut directly from an

overview to a detailed view, without any swooping animation (as advocated in the context

of data spaces by Mackinlay, Card and Robertson, 1990).

The cognitive model of film watching that we proposed assumed that the main task of

someone watching a film was to follow the narrative (in ICS terms, at the propositional and

implicational levels of representation), and that any requirement imposed upon them to

search the screen for relevant material (in ICS terms, at the object and propositional levels

of representation) interfered with this task, detracting from their immersion in the story and

making them aware of the surface detail at the expense of the story. Similarly, we argued

that in using a computer interface to carry out a task, any requirement for users to search

the screen to relocate elements following a scene change would interfere with their main

task, making the system less usable.

In the context of map use, we argued that the point that the user clicked on in the

overview was where they were interested in for their task, and would be where they were

still looking immediately after any screen change, and so the detailed view of the point

clicked should be redisplayed in the same physical position on the screen: a principle that

we called collocation. At that time, there were few such interactive maps readily available,

but we noted the similarity between zooming-in with a map and with graphics packages.

All but one of the graphics packages we surveyed broke the collocation principle: they did


6

a variety of things, often inconsistently, but most often they would translate the clicked

element to the centre of the new view. The exception in 1995 was Adobe Illustrator, the

vector based partner of Adobe Photoshop (although in a subsequent release, it too joined

the ‘move to centre’ camp).

One problem with our theoretically based principle of collocation was that we had no

empirical proof that film-makers actually did make use of it in the way that we claimed.

Our next step, then, was to obtain such evidence. In May, Dean and Barnard (2003) we

gave a more detailed account of the cognitive model of film watching, and reported an eye-

tracking study in which participants watched a commercially released film in its entirety.

We measured the amount that they moved their gaze location in the frames immediately

after each of ten classes of cuts, and found that there was minimal eye movement in cuts

where the cut zoomed-in to a detail, followed a moving actor or object, showed objects

whose position was predictable, or showed the result or consequence of an action, or

showed novel unexpected objects. In all of these circumstances, film makers had succeeded

in collocating salient regions of the screen before and after cuts, so that the viewers had not

had to move their eyes to locate something on the screen in order to comprehend the

unfolding narrative. In five other classes of cut, large eye movements were found. Based on

these findings, we specifically argued that map interfaces should make use of collocation

when zooming in and out to help map users rapidly locate the topic of their enquiry without

having to search the screen for it. We also made recommendations about interactions where

collocation would not be necessary, but where structural changes in the display might guide

a user to the novel information, while leaving previous information visible to provide a

context or history of the interaction. Such a situation might arise in the use of hierarchical

databases, for example, where an object or name makes sense in the context of


7

superordinate information and might be ambiguous or difficult to parse without this context

remaining on the screen.

In recent years, however, we have noted with dismay the adoption of the move-to-

centre convention for computer based maps. In Google™ Maps UK, for example, if one

has Dartmoor in the centre of the screen and Plymouth in the lower left (e.g.,

<http://maps.google.co.uk/maps?ll=50.529143,-3.915253&z=11>), then double-clicking

upon the place name Plymouth results in its replacement by a broad expanse of blue,

representing the sea. The same occurs in every online map we have found that includes a

click-to-zoom function. There are clearly reasons why this translation convention has been

adopted, other than imitation or the re-use of open source code. Computationally, all that

needs to be returned to the map server are the absolute co-ordinates of the clicked location

and a scaling factor (either of the current or of the new view), as can be seen by inspecting

the URLs, which generally include a latitude, longitude and level parameter. Without

knowing where within the previous image the clicked location was, the obvious place to put

it in the new view is in the centre. If the user is going to zoom in further, then it will remain

at the centre. It could even be argued that such interfaces are consistent, which has become

a watchword of design guidelines, and hence easy to learn and use.

However, translation also has disadvantages, from the point of view of the user. If

they have been looking in one point on the screen, the place they are interested is now not

there, and they have to search for it. If the new representation is stylistically different to the

original, as is often the case with maps of different scale, then it is not an easy task to

retopicalise oneself to the new view and to relocate the place name of interest. If one has

not clicked on the actual place, but on or near its name, which is usually to one side of the

place, then the actual location desired will not be in the centre of the screen, and if its name


8

has been repositioned on the new view it may be anywhere within the screen, depending

upon the scaling factors. If one is unfamiliar with a geographical region, as is often the case

when using a map, each wrong place name one reads gives no help; serial search is all that

is possible. On the other hand, translation has some advantages for the programmer,

because to collocate the result with the original map, two additional parameters need to be

returned to the server to indicate the relative location of the point of interest within the first

map, so that it can re-occupy that position in the new map. In essence, the decision between

the two modes of presentation appears to be a contest between ease of use and ease of

programming.

From a psychological point of view, the only way that a move-to-centre translation

algorithm could be of benefit to map readers rather than to map programmers, is if the

widespread adoption of the convention has been learnt by the general public and so they

expect to look in the centre of maps as a result of an interaction with a place on the

periphery. This is the question that we set out to answer with the first three experiments

reported in this paper. By contrasting an interface using collocation with the same design

but using translation, we should be able to see if our theoretically principled

recommendations provided any advantage for users, or whether they had grown used to the

convention and by knowing where to look would be able to interact as easily with the

translated interfaces.

Experiment 1

Method

There were 26 participants in the study, all students at the University of Sheffield,

who were paid £5 for their participation. They were told that the study was investigating


9

eye movements people made while zooming into maps, but not that either method of

zooming was of particular interest.

The two interface techniques of collocation and translation were presented to the

participants as different computer systems, both using maps. One was a traffic control

system, in which their task was to locate traffic monitors; the other was a gas supply system

and their task was to locate gas meters. The interface was written in RealBasic and

presented by 2.7GHz G5 Mac a on a 20” NEC Multisync 2080UX+ LCD monitor at 1024 x

768 pixel resolution, viewed by participants from a static operator’s chair with headrest (to

minimise head movement) at a distance of approximately 75 cm. Each map was 297mm

square, subtending approximately 22º, with the centre of the screen approximately level

with the participants’ eyes.

Thirteen scenarios were created, using Ordnance Survey maps of different cities in

Britain (the maps were obtained from the Edina Digimaps service under an educational

license). Each scenario began with a text message asking participants to locate a target in a

specific named location within a city, e.g. ‘There is a traffic monitor located near Walkley

in Sheffield. See if you can find it’. On clicking OK, the screen cleared and a start button

was displayed centrally. When clicked, an overview map of 1:50,000 scale was displayed

and the participant had to find and click on the specific local placename (e.g., Walkley).

Following the click, a 1:25,000 scale map of the area clicked on was displayed, and the

participants had to relocate and click on the placename, to be shown the final 1:12,500 map

(produced by rescaling the OS 1:10,000 map). This final map had a target symbol (a black

diamond 4mm high and wide) embedded within it, upon which the participant had to click

to complete a trial. The target was placed in the area that the participant had been asked to

search, but was not systematically located in relation to the place name on that view. If at


10

any point the participant clicked on a region not containing the target area, they could

‘zoom out’ to the previous map using option-click.

Twelve of the scenarios were divided between two experimental blocks. In one block

the maps were collocated such that the geographical location under the mouse cursor was

identical before and after the click, notwithstanding the change in scale. In the other block

the conventional translation approach was used, such that the geographical location clicked

on the first map was placed in the centre of the second map (Figure 1). The remaining

scenario, based on maps of Sheffield, was used as a practice trial before each block.

The traffic monitor and gas meter cover storied were balanced over experimental

conditions, and the order of the conditions was also balanced over participants. The twelve

scenarios were presented in a latin square design over participants, such that after twelve

participants each scenario had been used in each serial position within each condition.

Figure 1 about here

Each participant was tested individually. Participants’ eye movements were recorded

using a SensoMotoric Instruments iView X eye tracker, which requires no head restraint or

worn head position tracking device, thus making the situation as naturalistic as possible.

The screen coordinates of each participants’ gaze location were logged every 20 msec for

the duration of the experiment. The presentation computer recorded reaction times for each

experimental event.

Results

Reliable eye tracking data could not be obtained for four participants. Of the 22

remaining participants, ages ranged from 18y 9m to 27y 2m, and three were male. Eleven


11

completed the study in each order (i.e., Collocate-Translate or Translate-Collocate). Raw

eye tracking data was analysed using the BeGaze software, with parameters defined to

detect fixations with a minimum duration of 80 msec and a maximum dispersion of one-

sixth the screen height. This reduces the raw data to a series of fixations (where the pupil is

relatively stationary), saccades (where the pupil is moving) and blinks (where the pupil is

not detectable for several frames). If the software reported more than three blinks in a step,

we discarded the data as unreliable, because the software does not distinguish between true

blinks and episodes where the pupil cannot be detected due to tracking problems such as

head movement. Trials in which the participant used the zoom-out function to step back

were also discarded. Over the 264 trials from the 22 participants, data was rejected from 39

of the Find steps, 14 of the Relocate steps, and 18 of the Target steps (9% overall).

From the resulting analyses, the number of fixations, their total path length

(expressed as a percentage of the width of the map), and the overall duration of each step in

each trial was obtained (Figure 2). These three dependent variables were entered into a

MANOVA with the factors of condition and task step. There was no overall effect of

condition (Pillai’s trace F3,19=0.69), but there was an effect of task step (Pillai’s trace

F6,82=13.5, p<.001, partial eta sq=.497) and an interaction (Pillai’s trace F6,82=2.60,

p=.023, partial eta sq=.160).

Figure 2 about here

Univariate tests showed no overall effects of condition for any of the three measures

(all Fs<1), but significant effects of task step for all three (Fixations : F2,42=52.6,

MSE=12.3, p<.001, partial eta-sq=.715; Path lengths F2,42=58.5, MSE=1.06, p<.001,

partial eta-sq=.736; Durations F2,42=70.9, MSE=1.27, p<.001, partial eta-sq=.771), and


12

interactions of condition with task step for Fixations (F2,42=3.60, MSE=5.64, p=.036,

partial eta-sq=.146) and Durations (F2,42=5.04, MSE=0.85, p=.011, partial eta-sq=.194),

with a marginal interaction for Path lengths (F2,42=2.80, MSE=0.67, p=.07, partial eta-

sq=.118).

To interpret the interactions, two-tailed paired sample t tests were then conducted

between the two conditions in each task step. There were no differences between the two

conditions in the Find step (Fixations t(21)=1.22; Path-length t(21)=1.05; Duration

t(21)=1.72; all p > .23) or for the Target step (Fixations t(21)=1.17; Path-length

t(21)=0.15; Duration t(21)=0.34; all p ! .10), but on the Relocate step the translate interface

required more fixations (Col=2.76, Trans=3.89, t(21)= 2.36, p=.028), longer path lengths

(Col=65%, Trans=105%, t(21)=2.59, p = .017), and took longer to complete (Col=1.54s,

Trans=2.00s, t(21)=2.22, p=.038).

Discussion

The two interfaces did not differ on the initial Find step, where participants were

searching for the name for the first time, nor on the Target step, where they were searching

for the diamond symbol, but they did differ on the crucial Relocate step. At this point in the

task, participants had already found the name and clicked on it on the first map, and had to

find and click it again. The conventional translation approach of presenting the clicked

point in the centre of the new view required more and longer eye-movements and took

participants longer to execute than our principled approach of collocating the position on

successive maps. This difference exists even though the translation approach is consistent,

because the clicked point is always in the same physical screen position in the new view

and so participants could simply make a single change in gaze direction. Instead, they are

not able to make use of this consistency, having to search for the place name again.


13

The lack of benefit of collocation on the target step supports our argument because

the diamond was fixed on the map, and not collocated with the place name or the location

clicked by the user. In practice, the locate step required visual search in both interfaces.

It could be argued that this version of a map-searching task does not give much

benefit to the consistency of the translation approach because with only three levels of map

to search through, there is only one critical trial upon which this consistency can be used:

the transition from the second map (in which the clicked point has been translated to the

centre) to the final map. The next experiment extends the task to a five level task, so that

having selected a point on the first map, and having had it translated to the centre,

subsequent clicks on maps two, three and four should also be close to the centre of the map,

potentially making a translation approach more useful, and in practice, more like a

collocation approach.

Experiment 2

Method

There were 25 participants in this experiment, all students at the University of

Sheffield, and each was paid £10 for their participation. They were told that the study was

investigating eye movements people made while navigating computer-based maps. The

experiment was conducted using the same equipment as Experiment 1.

Twenty four map-reading scenarios were created using Ordnance Survey maps of the

Plymouth area. Each scenario began with a text message instructing participants to locate a

local place name in a numbered region of Plymouth. Participants read this instruction

aloud. On clicking OK, an 250mm square overview of Plymouth was displayed, based on a

1:50,000 scale map, upon which the numbers 1 to 12 had been superimposed in a clock


14

face arrangement (Figure 3). After clicking on or near the relevant number, a twice as

detailed view of a quarter of the original map was displayed. Clicking on this displayed a

third level view, again twice as detailed, and so this time of 1/16th of the original map.

Clicking on this presented a fourth (1/64th) and finally a fifth level (1/256th) view. The first

three levels were all based on the same 1:50,000 scale map, level four on a 1:25,000 scale

map, and level five on a 1:12,500 scale map (rescaled from the OS 1:10,000 map). When

the target place name had been clicked in the fifth level map, a congratulation message was

displayed. If at any time a click would have resulted in the placename being off-screen, an

error message was displayed and the trial was restarted from the instruction message (this

meant that the zoom-out function did not have to be made available).

The place names used as targets were spurious, being taken from a different region

(Northumberland) and added to the maps so that while they were in roughly the same

geographical location on each map, their actual position and typeface varied in the same

way that real place names varied in typeface and position on the three original maps used.

Figure 3 about here

Participants undertook two sessions (approximately three months apart), each of 24

trials divided into two blocks. One block in each session used the Translate method of

zooming in, so that the new view was centred upon the location represented by the point

clicked in the previous level; and the other block used the Collocate method, so that

location represented by the point clicked in the previous level was presented in the same

relative location within the new map window.

Block order was balanced over sessions and participants, so that half experienced

Translate first followed by Collocate in the first session, and then Collocate followed by


15

Translate in the second session. The twenty four scenarios were balanced so that half were

used in the translate condition in session one and the collocate condition in session two,

while the others were used in the collocate condition in session one and the translate block

in session two. Within each block, the order of trials was rotated over participants in a Latin

square, so that each target appeared at least once at each position in the block.

Results

Three participants were unable to return for the second session, and eye-tracking data

from one participant was too poor for analysis. This left 21 participants’ data for analysis (6

males; age range 20-36 years, median 24 years). Of these 1008 trials, 48 (5.2%) were

repeated due to errors, with one participant making 8 errors (17%); the next worst made 4

errors (8%). Errors were more frequent in the second session (n=29) than the first (n=19),

the reverse of a practice effect, but were more likely near the start of a block (number of

errors per four trials: 17; 10; 5; 7; 5; 4). They were also more frequent in the Translate

condition (n= 30) than in the Collocate condition (n=18). A binomial test gives a

probability of .056 of observing 18 outcomes or fewer out of 48, so this is a nearly

significant difference in favour of the collocate method.

For each of the five steps in each trial, three measures were computed as in the

previous experiment: the number of fixations, the total path-length, and the duration of each

step. For each participant, a mean for these measures was computed over the twelve trials

in each block, resulting in a 2x2x5 repeated measures design with the factors of Session,

Method, and Step.

Figure 4 about here


16

The three dependent variables of path-length, fixations and duration were entered into

a MANOVA, and multivariate tests showed that all factors and interactions produced

significant effects (Figure 4). Because of the effects of Session factor, separate MANOVAs

were then conducted upon each sessions’ data, in a 2x5 repeated measures design. The

effects of Method, Step and their interaction were significant in both of these analyses. Five

further MANOVAs were then conducted upon each Step, pooled over both Sessions, with

the single factor of Method. The multivariate tests showed no differences between the

conditions for Step 1 (F3,18=0.211, p=.887) or Step 5 (F3,18=1.16, p=.352). Steps 2, 3 and

4 showed significant differences between the methods (Step 2 F3,18=4.75, p=.013; Step 3:

F3,18=10.9, p<.001; Step 4 F3,18=8.29, p=.001), with the Translate condition resulting in

longer path-lengths, more fixations, and longer durations for each of these three steps

(Figure 4). Univariate tests within each of these MANOVAs showed the same pattern of

effects for all three dependent variables.

Discussion

Despite the consistent placing of the clicked points at the centre of the screen

following a transition to a new map, the translation interface is used less efficiently than the

collocated interface, where the new point is in the same physical location as the old point,

wherever this is on the screen. This is despite the fact that the place names themselves are

not in exactly the same positions on successive maps, so that some search is required in

both interfaces. The need to move gaze from the clicked location back to the centre is an

impediment to the usability of the translation interface.

While watching participants use the maps, it became apparent that the typefaces we

had used for our spurious place names did not look like the simple serif and sans serif faces

used for real names on the original map (Table 1). This could have made them stand out


17

more from the background, perhaps giving some advantage or disadvantage to one of the

interfaces that might not be apparent in a real map. In fact, the differences between the

typefaces might explain why the difference between the two interfaces is larger for Step 3

than for Steps 2 or 4. Step 2 is the first step on which the place name has to be found, and it

was in a Helvetica Bold face. It then changes to Times Normal on Step 3, and on Step 4 to

Georgia Normal. The appearance of the name is thus quite different between the second

and third steps, but quite similar between steps 3 and 4. The change from Georgia Normal

to Palatino Italic on step 5 is also quite different, although both are serif faces, and here the

interfaces performed equivalently. We therefore ran the task again using different

typefaces, changing between Times and Helvetica on each step. and including one change

from normal to italic, one between two italic faces, and one from an italic to a normal face.

Table 1 about here

Experiment Three

Method

Twelve participants, all postgraduate students (8 males and 4 females; age range 21:5

to 33:11, median 28:5), took part in this experiment. They were paid £5 for their assistance.

This experiment used the same materials and design as Experiment 2, except that the

typefaces used for the spurious place names was altered to make them more like those of

other names on the maps and the changes more systematic over different steps in the task

(Table 1).

Results

Error rates in this experiment were very low, with only five trials needing to be

repeated. Three of the errors were made by one participant, who also took abnormally long


18

to complete several steps of the Translate task (over a minute in one case), and so his data

were excluded. The three dependent measures of fixations, path-length and duration were

computed for each of the five task steps in both conditions, as in the previous experiments.

A MANOVA showed a main effect of condition (Multivariate F3,8=10.8, p=.003, partial

eta-sq=.802), Step (Multivariate F12,120=8.1, p<.001, partial eta-sq=.448), and their

interaction (Multivariate F12,120=2.28, p=.012, partial eta-sq=.186). Separate MANOVAs

at each step showed no effects of condition for Step 1 (Mult F3,8=0.51), Step 4 (Mult

F3,8=1.74) or Step 5 (Mult F3,8=0.58, but significant effects for Step 2 (Multivariate

F3,8=10.8, p=.003, partial eta-sq=.802) and Step 3 (Multivariate F3,8=10.8, p=.003, partial

eta-sq=.802). Two-tailed paired t tests showed that all three measures differed for Step 2

and Step 3.

Figure 5 about here

Discussion

This experiment provides a close replication of the previous findings, despite the

changes in typeface. The advantage of the collocated interface is the same on steps 2 and 3

(when the place name changes from a sans-serif normal to a serif italic face), but declines

on step 4 to become non-significant (when the place name changes between two italic

faces). In all three experiments, the initial and final steps do not differ. The initial step, of

course, is in fact the same for both interfaces, because the place name has not yet been

found and so there is nothing to collocate. On the final step in Experiment 1, the target was

a symbol rather than the place name, and was not collocated with the point clicked and so

the interfaces are again equivalent. However, in Experiments 2 and 3 the last four task steps

all require participants to click on the place name, and so there is no obvious reason why

the advantage of the collocated interface should disappear so consistently in the last step.


19

One possibility is that the low level of detail on these final maps makes it easy to

retopicalise on the place name, so that the searching required by the translated interface

becomes trivial.

Overall, this series of three experiments has shown that the advantage of collocating

maps over the conventional approach of translating them to the centre is robust, supporting

our argument that the design principle derived from the analysis of film cutting techniques

can be used to improve interface design. We now need to show that the advantage of

collocating is not generic, but depends upon the task being performed. There is some

indication in the Maps tasks that collocation does not always improve performance, when

the search is simple in the final step, for example. May, Dean and Barnard (2003) found

that collocation was not used in film editing in situations where the location of the topic of

interest was predictable and when the occurrence of the cut could be anticipated, as when

two actors were talking to each other. In such conversational scenes, cuts would alternate

between views of the two actors faces, on alternate sides of the screen. The resulting

sequence resembled the view of the actors one would have if standing next to them while

they were talking, moving ones gaze between them, prompted by the dialogue. Another

class of cut in which collocation did not occur were ‘over the shoulder’ cuts, where an actor

would be seen doing something before a cut, and the camera would then show the result of

their action from behind them, with the rear of their head and a shoulder occupying a corner

of the frame. Such framing of the action serves to provide a context, linking the results of

the action to the person responsible, and also allowing one to see it from their point of

view.

Extrapolating to computer interfaces, a similar lack of benefit from collocation should

be found when context is important, or when translation of focal point after a screen change


20

is predictable. Earlier in this paper, we suggested that a scenario that might benefit from the

retention of contextual information would be searching through a hierarchical database,

where one is successively refining the category within which one is searching. Instead of

replacing the initial set of categories with a second, finer-grained subset, a translated

interface would place the new subset off to one side, with the original selection still visible

so that the result of the interaction is contextualised by the super-ordinate category name.

This form of interface is in fact used in Apple’s iTunes ™software, in which three panes

arranged horizontally contain genre, artist and album titles. When nothing has been

selected, each pane shows all of the information in the Library. Clicking upon a genre in

the leftmost pane refines the content of the other two panes so that only artists and albums

within that genre are now listed. Clicking again upon an artist further refines the album list

to only those albums by that artist within that genre.

In the final two experiments in this paper, we contrast databases that work in a similar

manner to this, with the new information appearing alongside previous windows of

information, with collocated designs where the new information replaces the previous

window. If the collocation principle supported in the Maps interface were working simply

because it minimised eye movements, then the collocated versions of the databases should

similarly benefit, because the new information is physically located close to the previous

focus of attention. If, however, collocation worked in the Maps interfaces because it was

concordant with the users’ task, then it should not be as useful here, because the task

involves successively moving through a data structure in a predictable manner, with detail

unfolding on each operation. In our next study, we contrast a translated design in which the

new window opens to the right of the previous window, with a collocated design in which

the new window opens where the old window was, and the old window appears to the right.


21

The collocated design thus removes the sense of moving successively through the data

while retaining the availability of contextual information.

Experiment 4

Method

The same 26 participants who completed Experiment 1 also took part in this

experiment, in the same session. Half of them completed this experiment before

Experiment 1, and half after it. They were told that they would be searching two databases.

In one database, their task was to search for a specific car part, and in the other they would

be ordering a specific musical recording. The same equipment as in previous experiments

was used to conduct this experiment.

In each task, participants read an instruction message asking them either to order an

item, e.g., ‘Please order part number: CC05 for model: Seicento for make: Fiat’ or Please

order album: 05AT for artist: Nina Simone from Genre: Jazz’. On clicking the OK button,

they were presented with a list of ten car makers or musical genres and had to locate the

appropriate one, click it, and then click an OK button in the lower right of the window.

This then opened a second list in another window, containing the ten second-order items

(car models or musical artists), with a Back button and an OK button. Clicking the Back

button dismissed this window and returned participants to the previous step, intended to

allow participants to continue a trial having made an error on the previous step. On

selecting the appropriate item and clicking the OK button, the third and final window was

opened, displaying ten alphanumerical codes, again with Back and OK buttons. After

selecting an item and clicking OK, the trial ended with either a confirmation of correct


22

ordering or an error message. The next trial began when this message was dismissed by

clicking OK.

The three list windows each measured 120 mm wide by 100 mm high. The first

window was always located in the left third of the screen (which was 405mm wide),

centred vertically. The list items were displayed in Helvetica font, with capitals 4mm high.

In the translated interface, the second window was positioned in the middle third of the

screen, and the third window in the right third of the screen. In the collocated interface,

each window opened in the left third of the screen, with the previous windows(s) moving to

the right (see Figure 6), so that on the final step they were in the reverse order to the

translated interface.

Figure 6 about here

The order of interface design was balanced over participants, and the task content

(music or car parts) was balanced within interface design. Ten trials were run with each

task content, with the items chosen so that each position within the list was used once in

each window. The order of the ten trials was rotated in a Latin square so that after ten

participants each trial had been presented in each serial position within the task.

Results

Reliable eye-tracking data was not obtained for five participants. Four of these had

also been excluded from the analyses in Experiment 1, and one additional female (aged

27:2) was excluded. The remaining 21 participants completed a total of 420 trials with three

task steps in each trial, of which eye-tracking data was rejected for 89 task steps overall

(7%).


23

As in the previous experiments, the mean number of fixations, path-length and

duration were computed. Because the first task step was identical in the two conditions,

these data were not analysed, giving two task steps in each two conditions. Each step

consisted of selecting an item from the list and then clicking OK, so two parallel sets of

data were obtained and these were analysed in a single MANOVA as six measures (Figure

7). There were significant main effects of condition (Pillai’s Trace F6,15=4.14, p=.012,

partial eta sq=.62), task step (Pillai’s trace F6,15=8.15, p<.001, partial eta sq=.77) and an

interaction (Pillai’s trace F6,15=3.42, p=.025, partial eta sq=.58).

Figure 7 about here

For the item selection measures, Univariate Fs showed effects of task step upon

duration (F1,20=12.8, p=.002, partial eta sq=.39, MSE=.533), with the third step taking

longer, and an interaction for fixations (F1,20=6.10, p=.010, partial eta sq=.29, MSE=.745),

with the translate interface requiring more fixations on the third step. For the OK Clicks,

there were effects of condition upon duration (F1,20=8.63, p=.008, partial eta sq=.30,

MSE=.067), with the collocated condition taking longer; of task step upon duration

(F1,20=8.99, p=.007, partial eta sq=.31, MSE=.057), with the click being completed faster;

and interactions for path length (F1,20=8.01, p=.010, partial eta sq=.29, MSE=.004), with

the colocated interface being longer for the third step; and duration (F1,20=6.59, p=.018,

partial eta sq=.25, MSE=.066) with the translated interface being faster for the second step.

We had collected separate sets of data for the item selection and OK clicks because

once an item has been selected, the OK button is in the same relative position for both

interfaces, and so we had not anticipated there being any effects for these measures.

Because there were effects, we recombined the item selection and OK click measures to


24

produce overall measures reflecting performance on both steps. Now the MANOVA

produced a main effect only for condition (Mult F3,18=3.92, p=.026, partial eta sq=.395),

with task step (Mult F3,18=2.49, p=.093, partial eta sq=.293) and their interaction (Mult

F3,18=2.72, p=.075, partial eta sq=.312) falling short of significance. The Univariate Fs

showed no effects of condition for any of the measures, with an effect of task step upon

duration (F1,20=5.02, p=.037, partial eta sq=.201, MSE=), and interactions for fixations

(F1,20=8.95, p=.007, partial eta sq=.309, MSE=1.04) and path length (F1,20=4.50, p=.047,

partial eta sq=.183, MSE=.041), with a marginal effect for duration (F1,20=4.15, p=.055,

partial eta sq=.172, MSE=.541). The means indicate that that these interactions reflect an

advantage for the translated interface on the second step, but for the collocated interface on

the third step.

Discussion

Unlike the clear advantage for the collocated interface found in all three experiments

using the map task, there is no real superiority for either interface here. In terms of

selecting an item from the list, the interfaces are equivalent on the second step, and

collocate is better on the final step; but clicking the OK button is easier for the translated

interface on the second task step. The OK click data is surprising, because we had not

anticipated there being any differences between the interfaces. Once the item in the list has

been selected, the OK button is in the same relative position within the window for both

interfaces, and so there should be no difference in the number of fixations needed to direct

gaze towards it – in fact, the collocated interface might benefit here, because the OK button

is also in the same physical screen location, and could potentially be clicked without

needing to be looked at. The same considerations apply to the distances needed to move the

mouse between clicks – for the collocated interface the mouse must be moved down and

right from the selection to the OK button, and then back up and left to make the next


25

selection. In the translated interface, the same movement down and right must be made to

click the OK button, and the mouse must then move up and right to make the next

selection. The overall distance is equivalent. Nevertheless, the translated interface comes

out better on these measures.

There were a number of peculiarities in this task that might have made it more

difficult for participants than we had intended and which thus limit the generalization of the

results, especially since we are arguing that the data shows no clear advantage for either

interface. Participants reported that the meaningless alphanumeric code for the third task

step was particularly difficult to recall, especially since the task instructions presented the

items in the reverse sequence to that required by the interface. The presence of the Back

button on steps two and three was in retrospect unnecessary and while made the task more

like a real interface led to in increase in errors. Most obviously, though, the reappearance of

completed windows to the right of the new window in the collocated interface was

unrealistic. While the two designs both displayed contextual information in the form of

previously selected information, a real collocated design would simply replace the old

window with the new one rather than moving the old window elsewhere (the multiple

levels of dialogue boxes in many Microsoft applications, for example, are generally

superimposed upon each other, leading to a cascade of Apply and OK clicks being required

to complete the instruction). These issues were addressed in the final experiment.

Experiment 5

Method

Twenty first year undergraduates studying Psychology at the University of Plymouth

took part in this study, in return for participation points to be used to reward participants in


26

their own experiments. This experiment replicated the design and apparatus of Experiment

Four, with changes to the phrasing of the instructions, the content of the final items, the

ordering within the lists, treatment of errors, and the location of the screen in the collocated

interface. The meaningless alphanumeric album codes and car parts were replaced with real

titles or car parts, and the instructions were reordered so that the items were in the same

order as the interface, e.g., ‘please order Fiat - Seicento – fuel filter’ or ‘Please order Jazz -

Nina Simone - Ain’t got no..’. Within the lists, the items were no longer alphabetically

ordered, to avoid participants anticipating a likely location. The Back button was no longer

displayed, and if OK was clicked with an incorrect item selected at any point, an error

message was displayed and the trial was repeated from the instruction screen.

The translate interface window positions were the same as in Experiment Four, with

the focal window moving from left to right across the screen leaving the prior windows in

their original positions to its left. In the collocated interface all windows were now located

in the centre of the screen, replacing the previous window, so only the current list was

visible at any time.

Results

Complete eye-tracking data could not be obtained from four participants. The

remaining sixteen included 5 males and 11 females, and were aged between 20 y 10 m and

62 y 0 m (median 23 y 0 m).

As in Experiment Four, the mean number of fixations, path-length and duration were

computed for item selection and OK click on the second and third task steps of the two

interfaces, and these six measures were entered into a MANOVA with condition and task

step as factors (Figure 8). There were again main effects of condition (Pillai’s trace


27

F6,10=5.21, p=.011, partial eta sq=.76), task step (Pillai’s trace F6,10=5.16, p=.012, partial

eta sq=.76) and an interaction (Pillai’s trace F6,10=4.33, p=.021, partial eta sq=.72).

Figure 8 about here

For the item selection measures, Univariate Fs showed that there was an effect of

condition for path length (F1,15=15.2, p=.001, partial eta sq=.504, MSE=.046) with the

translate interface requiring longer paths; of task step for fixations (F1,15=23.1, p<.001,

partial eta sq=.606, MSE=1.344), path length (F1,15=13.6, p=.002, partial eta sq=.476,

MSE=.047) and duration (F1,15=20.2, p<.001, partial eta sq=.574, MSE=.155), with the

third step requiring more fixations, a longer path and taking longer to complete, and an

interaction for path length (F1,15=13.3, p=.002, partial eta sq=.470, MSE=.019), with the

translated interface requiring a longer path length for the third step. For the OK clicks,

Univariate Fs showed that there was an effect of condition for fixations (F1,15=17.7,

p=.001, partial eta sq=.542, MSE=.351), with the translated interface requiring fewer; there

were no effects of task step and no interactions.

When the item selection and OK click measures were combined to produce overall

measures for the two steps, there were again main effects of condition (Pillai’s trace

F3,13=4.92, p=.017, partial eta sq=.53), task step (Pillai’s trace F3,13=8.29, p=.002, partial

eta sq=.66) and an interaction (Pillai’s trace F3,13=10.0, p=.001, partial eta sq=.70). The

Univariate Fs indicated that there were effects of condition upon path length (F1,15=11.1,

p=.005, partial eta sq=.43, MSE=.036) and marginally upon duration (F1,15=4.40, p=.053,

partial eta sq=.23, MSE=.283); of task step upon fixations (F1,15=15.7, p=.001, partial eta

sq=.51, MSE=1.84) and duration (F1,15=21.5, p<.001, partial eta sq=.59, MSE=0.18); with

an interaction for path length (F1,15=17.9, p=.001, partial eta sq=.54, MSE=0.022). The


28

means indicate that the interaction reflects an advantage for the collocated interface on the

final task step.

Discussion

With the revised and more realistic interface, in which the collocated windows

replace each other, and the instructions present the information in the same order as the task

requires it to be used, the two interfaces are indistinguishable for the second task step, but

the collocated interface now has an advantage over the translated interface in terms of eye-

movement path length and duration for the final task step. Overall, combining both item

selection and OK clicking, there is no difference in the number of fixations required. As in

Experiment Four, there is a benefit for the translated interface in using the OK button, this

time in number of fixations.

The advantage for collocation on the final step is a consequence of the translated

interface requiring a longer path and duration here, compared to the previous step. There is

no obvious reason why the translated interface should perform worse in the final step

(moving from the artist to the song) than it did on the second step (moving from the genre

to the artist). Both interfaces show a rise in fixations for the item selection element of these

task steps, but only in the translate interface does this turn into a longer path and duration.

However, this is the same pattern as observed in the previous experiment, and so may be an

example of consistency of location improving performance within the trial, if not over

trials.

General Discussion

Our first three experiments showed a consistent benefit for collocating the point of

interaction with its result in a map interface; our final two have shown little consistent


29

benefit for collocation in a list-style database, although there is an indication that it may

lead to faster performance and shorter search paths on the last of three steps.

In summary, this is much what May, Dean and Barnard (2003) predicted, but which

the designers of online maps have not yet implemented. While it would be nice if we found

the conventional design of maps to change as a result of this work, we would also be happy

if it leads to a better understanding of how principles and theory can be used to guide

design decisions in interface design. In previous papers we have followed a path from

anecdotal observation (that film editors seem to control the position of elements within the

frame) to a cognitive model of film watching (in which any demand for visual search

interrupts the comprehension of narrative), collected empirical data to support the model (in

terms of eye movements following cuts in a commercial film), and applied it to interface

design (in the form of contextualised but generic principles such as collocation). Here we

have shown that this principle does what our model says it will, making it easier for a map

user to understand what they are looking at when they click to zoom-in to a map. We have

also shown that it does not seem to benefit the more verbal, context-laden task of using a

database, although here there is some evidence of another principle, that of consistency of

operation, coming into play. Making this path of inference and deduction apparent provides

a rationale for a design decision, that allows the designer to decide when a principle really

applies to their design, or whether other specific issues mean that another, more relevant

but contradictory principle might come into play. Without the rationale, there is no support

for the designer to make a reasoned choice amongst the guidance they are offered.

In the case of maps, it is not as if the designers are unaware of the confusion caused

by the unfilmic, move-to-centre algorithm. Most online maps now try to guide the user’s

gaze to the new, central location by making use of a form of animation, interpolating four


30

or five views of the initial map progressively moving a little each frame so that the clicked

point moves toward the centre of the screen, before then displaying the new map. Whether

intentionally or not, they are following the suggestion of Mackinlay, Card and Robertson

(1990) that the user should be supported in relocating points of view by moving them

through virtual space rather than jumping directly from point to point. They are making

their interfaces resemble fictional interfaces such as the 3d hologram viewer used by

Deckard in Ridley Scott’s (1982) film ‘Blade Runner’. This is not how film editors create

their own material, however, because the experience of watching uncut footage shot while

the cameraman is moving through a scene is not only time consuming but often nauseating,

when the viewer’s bodily and visual sensations do not correlate. Cutting directly between

two different view-points can work, if collocation is used, as film makers do know, and as

interface designers should know.

The collocation principle applies to more than map interfaces. It should also work

whenever the user wants to move directly from one object to another, as yet unrepresented,

object, and where the previous view is now superfluous. Clicking on an entry in a table of

contents in a word-processor can allow you to jump to that heading in the body of the text,

but the heading is rarely located on-screen in the same position as was the corresponding

entry in the table of contents, and must be searched for. Our simple database began to show

benefits of collocation, even though we had not expected it to, perhaps because there was

not sufficient need for the contextual information remaining from previous task steps.

Increasing memory load, or ambiguity of material, as in our Experiment Four, makes the

contextual information more useful and collocation less beneficial. These predictions and

post hoc explanations are not plucked from thin air but based upon the underlying cognitive

model of film watching in May, Dean and Barnard (2003) and its contrast between the


31

propositional-implication loop required for narrative comprehension and the propositional-

object loop required for visual search.

From a psychological point of view, the model of interface use and the ICS account

of cognition can be used to reason about the cognitive consequences of changes to the task,

such as the use of verbal material of increasing ambiguity (increasing demand upon a

propositional-morphonolexical loop). Without the link between the principle and its

theoretical derivation, such extrapolation and extension would be impossible and the

guidance to designers would become rapidly outdated, as new devices are developed, and

novel tasks digitized. It also provides a link between the applied domain of human-

computer interaction and computational models of visual search, (see, for example,

Zelinsky, 2008), where top-down models in which search is driven by object based

information are simulating human behaviour with increasing accuracy.

References

Barnard, P. J. (1985). Interacting cognitive subsystems : A psycholinguistic approach to

short-term memory. In A. Ellis, (ed.), Progress in the Psychology of Language, Vol.

2, 197-258. Erlbaum: London,.

Barnard, P. J., May, J., & Green, A. J. K. (1991). Preliminaries for the Application of

Approximate Cognitive Modeling to Design Scenarios. Esprit BRA3066 Amodeus

working paper RP4/WP11. Cambridge, UK: Medical Research Council Applied

Psychology Unit.

Kraft, R. N. (1986). The role of cutting in the evaluation and retention of film. Journal of

Experimental Psychology: Learning Memory and Cognition, 12(1), 155-163.


32

Kraft, R. N. (1987). Rules and strategies of visual narratives. Perceptual and Motor Skills,

64, 3-14.

Mackinlay, J.D., Card., S.K., & Robertson, G.G. (1990) ‘Rapid controlled movement

through a virtual 3d workspace’, In Proceedings of Siggraph’90, 171-176

May, J., & Barnard, P. J. (1995). Cinematography and interface design. In K. Nordby, P.

Helmersen, D. J. Gilmore, & S. Arnesen (Eds.), Human-computer interaction:

Interact'95 (pp. 26-31): Chapman and Hall.

May, J., Dean, M.P. & Barnard, P.J. (2003) Using film cutting techniques in interface

design. Human-Computer Interaction, 18, 325-372.

Moellering, H. (1975) At the interface of cartography and computer graphics. Proceedings

of the 2nd annual conference on computer graphics and interactive techniques,

Bowling Green Ohio, June 25-27. pp.256-259. ACM Press.

Scott, R. (Director). (1982) Blade Runner [MotionPicture]. United States: Warner Brothers.

Young, E., & Clanton, C. (1993). Film craft in user interface design. Tutorial presented at

the InterCHI’93 conference, Amsterdam.

Zelinksy, G. J. (2008) Eye movements during target acquisition. Psychological Review,

115(4), 787-835.


33

Tables and Figures

Table 1: The typefaces used in experiment three were chosen so that they were more

consistent than those in experiment two, were more like the faces used for real place names

on the OS maps, and the differences between successive maps were equivalent.

Figure 1: In the collocate condition (upper row) the geographical detail clicked on is

positioned under the cursor on the new, higher scale map. In the translate condition (bottom

row), the clicked detail is moved to the centre of the new map. In both conditions, the task

is to find a place name in the first map (the Splott area of Cardiff), relocate it in the second

map, and select a diamond shaped target symbol in the third map. The simulated arrow

cursors indicate the point that is about to be clicked, and the lines show the optimum

movement from the previous click. Note that the location of the place name relative to the

street layout varies from map to map. Maps (c) Crown Copyright/database right 2008. An

Ordnance Survey/EDINA supplied service.

Figure 2: The two interfaces performed identically on the find and target steps, but

the translate interface (bold line) required more gaze fixations, a longer search path, and

took longer to complete on the critical relocate step, than did the collocated interface

(narrow line). Bars indicate one standard error.

Figure 3: In the second experiment, participants had to locate a place name that had

been added to five increasingly detailed maps such that its appearance and geographical

position varied from map to map, as real place names do on different scale maps of the

same region. In this example, the task is to locate Bowsden using the collocated interface.

Maps (c) Crown Copyright/database right 2008. An Ordnance Survey/EDINA supplied

service.


34

Figure 4: The Translate method (bold line) required a longer path length, more

fixations, and took more time than the Collocate method (thin line) for steps 2, 3 and 4 of

the map task. The initial steps (locating the number on the clock face) and the final steps

did not differ (bars indicate one standard error).

Figure 5: Changing the typefaces of the place names shown in experiment three did

not remove the advantage of the collocated interface (thin line) over the translated (bold

line) interface for the intermediate steps of the task (bars indicate one standard error).

Figure 6: Three successive screens from an interaction with the car-parts database in

experiment four, with the translated interface on the left and the collocated interface on the

right. The participant has been asked to ‘Please order part: CC05, for model: Seicento, for

make: Fiat’.

Figure 7: In experiment four, the translated interface (bold lines) is better on the

second task step but the collocated interface (narrow lines) is better on the third step. Step

totals shown by solid lines with circles; item selection solid lines without markers; OK

clicks dashed lines; bars indicate standard errors.

Figure 8: In experiment five, on the second task step the interfaces are

indistinguishable, but on the third step the translated interface (bold lines) requires a longer

path-length than the collocated interface (narrow lines); step totals shown by lines with

circles; item selection solid lines without markers; OK clicks dashed lines; bars indicate

standard errors.


35

Level Experiment 2 Experiment 3

Step2 Helvetica Bold Times Normal

Step 3 Times Normal Helvetica Italic

Step 4 Georgia Normal Times Italic

Step 5 Palatino Italic Helvetica Normal

Table 1:

Fig

ure

1:

Coll

oca

ting i

nte

rfac

e obje

cts

37

Fig

ure

2

Coll

oca

ting i

nte

rfac

e obje

cts

38

Fig

ure

3:

Coll

oca

ting i

nte

rfac

e obje

cts

39

Fig

ure

4:

Coll

oca

ting i

nte

rfac

e obje

cts

40

Fig

ure

5:

Coll

oca

ting i

nte

rfac

e obje

cts

41

Fig

ure

6:

Coll

oca

ting i

nte

rfac

e obje

cts

42

Fig

ure

7

Coll

oca

ting i

nte

rfac

e obje

cts

43

Fig

ure

8:

collocating interface objects : jon may tim gamble contact ... · collocating interface objects 2...

Documents