collocating interface objects : jon may tim gamble contact ... · collocating interface objects 2...
TRANSCRIPT
Collocating interface objects :
Zooming into Maps and Databases
Jon May
University of Plymouth
Tim Gamble
University of Sheffield
Contact Author: Professor Jon May
School of Psychology
University of Plymouth
Drake Circus
PL4 8AA
Running Head: Collocating interface objects
Collocating interface objects
2
Abstract
May, Dean and Barnard (2003) used a theoretically based model of visual human-
computer interaction to argue that interface objects should be collocated following screen
changes such as a zoom-in to detail. Existing online maps do not follow this principle, but
uniformly move a clicked point to the centre of the subsequent display, leaving the user
looking at an unrelated location. This paper presents three experiments showing that
collocating the point clicked so that the detailed location appears in the place previously
occupied by the overview location makes the map easier to use, reflected in a reduction in
eye movements and interaction duration. Two further experiments show that a list-like
database interface which benefits from the persistence of contextual information does not
show the same benefit of collocation, supporting the claim that the nature of the task must
be taken into account in choosing how to design dynamic displays. We discuss the benefit
of basing design principles on theoretical models so that they can be extended to novel
situations.
(168 words)
Acknowledgement
This work was supported by the Engineering and Physical Sciences Research Council
[EP/C515528/2]
Collocating interface objects
3
Collocating interface objects : Zooming into Maps and Databases
Interactive maps are everywhere nowadays, and map use has become routine.
Ramblers and drivers no longer have to battle with unwieldy sheets of paper, compromising
breadth of representation against level of detail, with the crucial junction lost in the fold of
the map, and the new road not even shown. Up to date cartographical information is now
instantly accessible on computers and in cars, even on handheld devices that are wirelessly
connected to the internet, accurately positioned by GPS. In principle, relevant objects or
businesses can be highlighted and irrelevant detail removed from the view, and the maps
can be drawn at whatever level of detail the user requires for their current goal, zooming in
from a low detail overview to successively more detailed local views of an area.
Such flexibility comes at a cost, however, because the representation on the device
cannot necessarily be the same as that on a paper map. A paper map is a designed
representation of reality. It has been constructed by a skilled practitioner to take advantage
of the map readers’ attentional and perceptual processing, to allow them to selectively
attend to certain classes of information in alternation. Enormously successful, cartographic
conventions have not developed overnight, and their evolution has relied upon
developments in printing technology as well as the skill of the cartographer. In moving
from a static, paper to dynamic, electronic representations, with far smaller screens and
lower resolutions than are achievable in print, cartography faces a challenge. Allowing
classes of information to be hidden or revealed adds interactional overheads to the design
and use of a map that are absent when the map design takes advantage of selective
attention. Changing between different scales of map as the user zooms in and out requires
design decisions to be made about how successive views should be related. Designing an
interactive map becomes a problem in human-computer interaction, prone to the same
Collocating interface objects
4
conflicting demands as other computer interface designs. Technological change makes
trade-offs between speed, capacity and mode of delivery difficult to anticipate. Fashions in
implementation flourish and designers imitate more often than they innovate.
One thing that does not change, however, is the mental architecture of the map
reader. We have previously argued that computer interface design should be a science as
much as a craft, and should benefit by a principled understanding of the tasks users
undertake, in order to capitalise upon the mental resources available for interaction. Early
in the development of computer based maps, Moellering (1975) noted that ‘when one
begins to work with interactive display systems and particularly animations … displays
must be considered in a dynamic setting.’ Some researchers have since advocated that
interface designers learn from the dynamics of film editing, and should manage changes in
their interfaces in the same way that film editors manipulated temporal and spatial jumps in
films (e.g., Young and Clanton, 1993). In May and Barnard (1995), we argued that rather
than naively imitate cinematographic editing conventions, designers needed to understand
why these conventions worked for films, and to create parallel changes in their interfaces
based upon that understanding. Film editing had evolved as a craft skill, and its rules of
thumb were situated within the domain of camera angles, lighting, and narrative, as sets of
‘dos and don’ts’, with no real account of why certain practices worked and were ‘filmic’,
while others were ‘unfilmic’. While the work of researchers such as Kraft (1986, 1987) had
shown that conventions in film cutting did have a sound psychological basis, their domain
specific nature made it hard for designers to know what it was they were to apply to their
interfaces.
May and Barnard (1995) presented a theoretical analysis of cinematography based
upon Barnard’s (1985) Interacting Cognitive Subsystems model of cognition, and made
Collocating interface objects
5
several extrapolations to interface design, including one scenario of a tourist information
system combining large-scale and detailed views of a locale to help people find places of
interest and plan routes (originally discussed in Barnard, May & Green, 1991). The
problem posed was how to combine these two representations. In contrast to ingenious and
novel combinations of fish-eye views, magnifying lenses and multiple simultaneous
displays, we argued that to be like a film the map interface should cut directly from an
overview to a detailed view, without any swooping animation (as advocated in the context
of data spaces by Mackinlay, Card and Robertson, 1990).
The cognitive model of film watching that we proposed assumed that the main task of
someone watching a film was to follow the narrative (in ICS terms, at the propositional and
implicational levels of representation), and that any requirement imposed upon them to
search the screen for relevant material (in ICS terms, at the object and propositional levels
of representation) interfered with this task, detracting from their immersion in the story and
making them aware of the surface detail at the expense of the story. Similarly, we argued
that in using a computer interface to carry out a task, any requirement for users to search
the screen to relocate elements following a scene change would interfere with their main
task, making the system less usable.
In the context of map use, we argued that the point that the user clicked on in the
overview was where they were interested in for their task, and would be where they were
still looking immediately after any screen change, and so the detailed view of the point
clicked should be redisplayed in the same physical position on the screen: a principle that
we called collocation. At that time, there were few such interactive maps readily available,
but we noted the similarity between zooming-in with a map and with graphics packages.
All but one of the graphics packages we surveyed broke the collocation principle: they did
Collocating interface objects
6
a variety of things, often inconsistently, but most often they would translate the clicked
element to the centre of the new view. The exception in 1995 was Adobe Illustrator, the
vector based partner of Adobe Photoshop (although in a subsequent release, it too joined
the ‘move to centre’ camp).
One problem with our theoretically based principle of collocation was that we had no
empirical proof that film-makers actually did make use of it in the way that we claimed.
Our next step, then, was to obtain such evidence. In May, Dean and Barnard (2003) we
gave a more detailed account of the cognitive model of film watching, and reported an eye-
tracking study in which participants watched a commercially released film in its entirety.
We measured the amount that they moved their gaze location in the frames immediately
after each of ten classes of cuts, and found that there was minimal eye movement in cuts
where the cut zoomed-in to a detail, followed a moving actor or object, showed objects
whose position was predictable, or showed the result or consequence of an action, or
showed novel unexpected objects. In all of these circumstances, film makers had succeeded
in collocating salient regions of the screen before and after cuts, so that the viewers had not
had to move their eyes to locate something on the screen in order to comprehend the
unfolding narrative. In five other classes of cut, large eye movements were found. Based on
these findings, we specifically argued that map interfaces should make use of collocation
when zooming in and out to help map users rapidly locate the topic of their enquiry without
having to search the screen for it. We also made recommendations about interactions where
collocation would not be necessary, but where structural changes in the display might guide
a user to the novel information, while leaving previous information visible to provide a
context or history of the interaction. Such a situation might arise in the use of hierarchical
databases, for example, where an object or name makes sense in the context of
Collocating interface objects
7
superordinate information and might be ambiguous or difficult to parse without this context
remaining on the screen.
In recent years, however, we have noted with dismay the adoption of the move-to-
centre convention for computer based maps. In Google™ Maps UK, for example, if one
has Dartmoor in the centre of the screen and Plymouth in the lower left (e.g.,
<http://maps.google.co.uk/maps?ll=50.529143,-3.915253&z=11>), then double-clicking
upon the place name Plymouth results in its replacement by a broad expanse of blue,
representing the sea. The same occurs in every online map we have found that includes a
click-to-zoom function. There are clearly reasons why this translation convention has been
adopted, other than imitation or the re-use of open source code. Computationally, all that
needs to be returned to the map server are the absolute co-ordinates of the clicked location
and a scaling factor (either of the current or of the new view), as can be seen by inspecting
the URLs, which generally include a latitude, longitude and level parameter. Without
knowing where within the previous image the clicked location was, the obvious place to put
it in the new view is in the centre. If the user is going to zoom in further, then it will remain
at the centre. It could even be argued that such interfaces are consistent, which has become
a watchword of design guidelines, and hence easy to learn and use.
However, translation also has disadvantages, from the point of view of the user. If
they have been looking in one point on the screen, the place they are interested is now not
there, and they have to search for it. If the new representation is stylistically different to the
original, as is often the case with maps of different scale, then it is not an easy task to
retopicalise oneself to the new view and to relocate the place name of interest. If one has
not clicked on the actual place, but on or near its name, which is usually to one side of the
place, then the actual location desired will not be in the centre of the screen, and if its name
Collocating interface objects
8
has been repositioned on the new view it may be anywhere within the screen, depending
upon the scaling factors. If one is unfamiliar with a geographical region, as is often the case
when using a map, each wrong place name one reads gives no help; serial search is all that
is possible. On the other hand, translation has some advantages for the programmer,
because to collocate the result with the original map, two additional parameters need to be
returned to the server to indicate the relative location of the point of interest within the first
map, so that it can re-occupy that position in the new map. In essence, the decision between
the two modes of presentation appears to be a contest between ease of use and ease of
programming.
From a psychological point of view, the only way that a move-to-centre translation
algorithm could be of benefit to map readers rather than to map programmers, is if the
widespread adoption of the convention has been learnt by the general public and so they
expect to look in the centre of maps as a result of an interaction with a place on the
periphery. This is the question that we set out to answer with the first three experiments
reported in this paper. By contrasting an interface using collocation with the same design
but using translation, we should be able to see if our theoretically principled
recommendations provided any advantage for users, or whether they had grown used to the
convention and by knowing where to look would be able to interact as easily with the
translated interfaces.
Experiment 1
Method
There were 26 participants in the study, all students at the University of Sheffield,
who were paid £5 for their participation. They were told that the study was investigating
Collocating interface objects
9
eye movements people made while zooming into maps, but not that either method of
zooming was of particular interest.
The two interface techniques of collocation and translation were presented to the
participants as different computer systems, both using maps. One was a traffic control
system, in which their task was to locate traffic monitors; the other was a gas supply system
and their task was to locate gas meters. The interface was written in RealBasic and
presented by 2.7GHz G5 Mac a on a 20” NEC Multisync 2080UX+ LCD monitor at 1024 x
768 pixel resolution, viewed by participants from a static operator’s chair with headrest (to
minimise head movement) at a distance of approximately 75 cm. Each map was 297mm
square, subtending approximately 22º, with the centre of the screen approximately level
with the participants’ eyes.
Thirteen scenarios were created, using Ordnance Survey maps of different cities in
Britain (the maps were obtained from the Edina Digimaps service under an educational
license). Each scenario began with a text message asking participants to locate a target in a
specific named location within a city, e.g. ‘There is a traffic monitor located near Walkley
in Sheffield. See if you can find it’. On clicking OK, the screen cleared and a start button
was displayed centrally. When clicked, an overview map of 1:50,000 scale was displayed
and the participant had to find and click on the specific local placename (e.g., Walkley).
Following the click, a 1:25,000 scale map of the area clicked on was displayed, and the
participants had to relocate and click on the placename, to be shown the final 1:12,500 map
(produced by rescaling the OS 1:10,000 map). This final map had a target symbol (a black
diamond 4mm high and wide) embedded within it, upon which the participant had to click
to complete a trial. The target was placed in the area that the participant had been asked to
search, but was not systematically located in relation to the place name on that view. If at
Collocating interface objects
10
any point the participant clicked on a region not containing the target area, they could
‘zoom out’ to the previous map using option-click.
Twelve of the scenarios were divided between two experimental blocks. In one block
the maps were collocated such that the geographical location under the mouse cursor was
identical before and after the click, notwithstanding the change in scale. In the other block
the conventional translation approach was used, such that the geographical location clicked
on the first map was placed in the centre of the second map (Figure 1). The remaining
scenario, based on maps of Sheffield, was used as a practice trial before each block.
The traffic monitor and gas meter cover storied were balanced over experimental
conditions, and the order of the conditions was also balanced over participants. The twelve
scenarios were presented in a latin square design over participants, such that after twelve
participants each scenario had been used in each serial position within each condition.
Figure 1 about here
Each participant was tested individually. Participants’ eye movements were recorded
using a SensoMotoric Instruments iView X eye tracker, which requires no head restraint or
worn head position tracking device, thus making the situation as naturalistic as possible.
The screen coordinates of each participants’ gaze location were logged every 20 msec for
the duration of the experiment. The presentation computer recorded reaction times for each
experimental event.
Results
Reliable eye tracking data could not be obtained for four participants. Of the 22
remaining participants, ages ranged from 18y 9m to 27y 2m, and three were male. Eleven
Collocating interface objects
11
completed the study in each order (i.e., Collocate-Translate or Translate-Collocate). Raw
eye tracking data was analysed using the BeGaze software, with parameters defined to
detect fixations with a minimum duration of 80 msec and a maximum dispersion of one-
sixth the screen height. This reduces the raw data to a series of fixations (where the pupil is
relatively stationary), saccades (where the pupil is moving) and blinks (where the pupil is
not detectable for several frames). If the software reported more than three blinks in a step,
we discarded the data as unreliable, because the software does not distinguish between true
blinks and episodes where the pupil cannot be detected due to tracking problems such as
head movement. Trials in which the participant used the zoom-out function to step back
were also discarded. Over the 264 trials from the 22 participants, data was rejected from 39
of the Find steps, 14 of the Relocate steps, and 18 of the Target steps (9% overall).
From the resulting analyses, the number of fixations, their total path length
(expressed as a percentage of the width of the map), and the overall duration of each step in
each trial was obtained (Figure 2). These three dependent variables were entered into a
MANOVA with the factors of condition and task step. There was no overall effect of
condition (Pillai’s trace F3,19=0.69), but there was an effect of task step (Pillai’s trace
F6,82=13.5, p<.001, partial eta sq=.497) and an interaction (Pillai’s trace F6,82=2.60,
p=.023, partial eta sq=.160).
Figure 2 about here
Univariate tests showed no overall effects of condition for any of the three measures
(all Fs<1), but significant effects of task step for all three (Fixations : F2,42=52.6,
MSE=12.3, p<.001, partial eta-sq=.715; Path lengths F2,42=58.5, MSE=1.06, p<.001,
partial eta-sq=.736; Durations F2,42=70.9, MSE=1.27, p<.001, partial eta-sq=.771), and
Collocating interface objects
12
interactions of condition with task step for Fixations (F2,42=3.60, MSE=5.64, p=.036,
partial eta-sq=.146) and Durations (F2,42=5.04, MSE=0.85, p=.011, partial eta-sq=.194),
with a marginal interaction for Path lengths (F2,42=2.80, MSE=0.67, p=.07, partial eta-
sq=.118).
To interpret the interactions, two-tailed paired sample t tests were then conducted
between the two conditions in each task step. There were no differences between the two
conditions in the Find step (Fixations t(21)=1.22; Path-length t(21)=1.05; Duration
t(21)=1.72; all p > .23) or for the Target step (Fixations t(21)=1.17; Path-length
t(21)=0.15; Duration t(21)=0.34; all p ! .10), but on the Relocate step the translate interface
required more fixations (Col=2.76, Trans=3.89, t(21)= 2.36, p=.028), longer path lengths
(Col=65%, Trans=105%, t(21)=2.59, p = .017), and took longer to complete (Col=1.54s,
Trans=2.00s, t(21)=2.22, p=.038).
Discussion
The two interfaces did not differ on the initial Find step, where participants were
searching for the name for the first time, nor on the Target step, where they were searching
for the diamond symbol, but they did differ on the crucial Relocate step. At this point in the
task, participants had already found the name and clicked on it on the first map, and had to
find and click it again. The conventional translation approach of presenting the clicked
point in the centre of the new view required more and longer eye-movements and took
participants longer to execute than our principled approach of collocating the position on
successive maps. This difference exists even though the translation approach is consistent,
because the clicked point is always in the same physical screen position in the new view
and so participants could simply make a single change in gaze direction. Instead, they are
not able to make use of this consistency, having to search for the place name again.
Collocating interface objects
13
The lack of benefit of collocation on the target step supports our argument because
the diamond was fixed on the map, and not collocated with the place name or the location
clicked by the user. In practice, the locate step required visual search in both interfaces.
It could be argued that this version of a map-searching task does not give much
benefit to the consistency of the translation approach because with only three levels of map
to search through, there is only one critical trial upon which this consistency can be used:
the transition from the second map (in which the clicked point has been translated to the
centre) to the final map. The next experiment extends the task to a five level task, so that
having selected a point on the first map, and having had it translated to the centre,
subsequent clicks on maps two, three and four should also be close to the centre of the map,
potentially making a translation approach more useful, and in practice, more like a
collocation approach.
Experiment 2
Method
There were 25 participants in this experiment, all students at the University of
Sheffield, and each was paid £10 for their participation. They were told that the study was
investigating eye movements people made while navigating computer-based maps. The
experiment was conducted using the same equipment as Experiment 1.
Twenty four map-reading scenarios were created using Ordnance Survey maps of the
Plymouth area. Each scenario began with a text message instructing participants to locate a
local place name in a numbered region of Plymouth. Participants read this instruction
aloud. On clicking OK, an 250mm square overview of Plymouth was displayed, based on a
1:50,000 scale map, upon which the numbers 1 to 12 had been superimposed in a clock
Collocating interface objects
14
face arrangement (Figure 3). After clicking on or near the relevant number, a twice as
detailed view of a quarter of the original map was displayed. Clicking on this displayed a
third level view, again twice as detailed, and so this time of 1/16th of the original map.
Clicking on this presented a fourth (1/64th) and finally a fifth level (1/256th) view. The first
three levels were all based on the same 1:50,000 scale map, level four on a 1:25,000 scale
map, and level five on a 1:12,500 scale map (rescaled from the OS 1:10,000 map). When
the target place name had been clicked in the fifth level map, a congratulation message was
displayed. If at any time a click would have resulted in the placename being off-screen, an
error message was displayed and the trial was restarted from the instruction message (this
meant that the zoom-out function did not have to be made available).
The place names used as targets were spurious, being taken from a different region
(Northumberland) and added to the maps so that while they were in roughly the same
geographical location on each map, their actual position and typeface varied in the same
way that real place names varied in typeface and position on the three original maps used.
Figure 3 about here
Participants undertook two sessions (approximately three months apart), each of 24
trials divided into two blocks. One block in each session used the Translate method of
zooming in, so that the new view was centred upon the location represented by the point
clicked in the previous level; and the other block used the Collocate method, so that
location represented by the point clicked in the previous level was presented in the same
relative location within the new map window.
Block order was balanced over sessions and participants, so that half experienced
Translate first followed by Collocate in the first session, and then Collocate followed by
Collocating interface objects
15
Translate in the second session. The twenty four scenarios were balanced so that half were
used in the translate condition in session one and the collocate condition in session two,
while the others were used in the collocate condition in session one and the translate block
in session two. Within each block, the order of trials was rotated over participants in a Latin
square, so that each target appeared at least once at each position in the block.
Results
Three participants were unable to return for the second session, and eye-tracking data
from one participant was too poor for analysis. This left 21 participants’ data for analysis (6
males; age range 20-36 years, median 24 years). Of these 1008 trials, 48 (5.2%) were
repeated due to errors, with one participant making 8 errors (17%); the next worst made 4
errors (8%). Errors were more frequent in the second session (n=29) than the first (n=19),
the reverse of a practice effect, but were more likely near the start of a block (number of
errors per four trials: 17; 10; 5; 7; 5; 4). They were also more frequent in the Translate
condition (n= 30) than in the Collocate condition (n=18). A binomial test gives a
probability of .056 of observing 18 outcomes or fewer out of 48, so this is a nearly
significant difference in favour of the collocate method.
For each of the five steps in each trial, three measures were computed as in the
previous experiment: the number of fixations, the total path-length, and the duration of each
step. For each participant, a mean for these measures was computed over the twelve trials
in each block, resulting in a 2x2x5 repeated measures design with the factors of Session,
Method, and Step.
Figure 4 about here
Collocating interface objects
16
The three dependent variables of path-length, fixations and duration were entered into
a MANOVA, and multivariate tests showed that all factors and interactions produced
significant effects (Figure 4). Because of the effects of Session factor, separate MANOVAs
were then conducted upon each sessions’ data, in a 2x5 repeated measures design. The
effects of Method, Step and their interaction were significant in both of these analyses. Five
further MANOVAs were then conducted upon each Step, pooled over both Sessions, with
the single factor of Method. The multivariate tests showed no differences between the
conditions for Step 1 (F3,18=0.211, p=.887) or Step 5 (F3,18=1.16, p=.352). Steps 2, 3 and
4 showed significant differences between the methods (Step 2 F3,18=4.75, p=.013; Step 3:
F3,18=10.9, p<.001; Step 4 F3,18=8.29, p=.001), with the Translate condition resulting in
longer path-lengths, more fixations, and longer durations for each of these three steps
(Figure 4). Univariate tests within each of these MANOVAs showed the same pattern of
effects for all three dependent variables.
Discussion
Despite the consistent placing of the clicked points at the centre of the screen
following a transition to a new map, the translation interface is used less efficiently than the
collocated interface, where the new point is in the same physical location as the old point,
wherever this is on the screen. This is despite the fact that the place names themselves are
not in exactly the same positions on successive maps, so that some search is required in
both interfaces. The need to move gaze from the clicked location back to the centre is an
impediment to the usability of the translation interface.
While watching participants use the maps, it became apparent that the typefaces we
had used for our spurious place names did not look like the simple serif and sans serif faces
used for real names on the original map (Table 1). This could have made them stand out
Collocating interface objects
17
more from the background, perhaps giving some advantage or disadvantage to one of the
interfaces that might not be apparent in a real map. In fact, the differences between the
typefaces might explain why the difference between the two interfaces is larger for Step 3
than for Steps 2 or 4. Step 2 is the first step on which the place name has to be found, and it
was in a Helvetica Bold face. It then changes to Times Normal on Step 3, and on Step 4 to
Georgia Normal. The appearance of the name is thus quite different between the second
and third steps, but quite similar between steps 3 and 4. The change from Georgia Normal
to Palatino Italic on step 5 is also quite different, although both are serif faces, and here the
interfaces performed equivalently. We therefore ran the task again using different
typefaces, changing between Times and Helvetica on each step. and including one change
from normal to italic, one between two italic faces, and one from an italic to a normal face.
Table 1 about here
Experiment Three
Method
Twelve participants, all postgraduate students (8 males and 4 females; age range 21:5
to 33:11, median 28:5), took part in this experiment. They were paid £5 for their assistance.
This experiment used the same materials and design as Experiment 2, except that the
typefaces used for the spurious place names was altered to make them more like those of
other names on the maps and the changes more systematic over different steps in the task
(Table 1).
Results
Error rates in this experiment were very low, with only five trials needing to be
repeated. Three of the errors were made by one participant, who also took abnormally long
Collocating interface objects
18
to complete several steps of the Translate task (over a minute in one case), and so his data
were excluded. The three dependent measures of fixations, path-length and duration were
computed for each of the five task steps in both conditions, as in the previous experiments.
A MANOVA showed a main effect of condition (Multivariate F3,8=10.8, p=.003, partial
eta-sq=.802), Step (Multivariate F12,120=8.1, p<.001, partial eta-sq=.448), and their
interaction (Multivariate F12,120=2.28, p=.012, partial eta-sq=.186). Separate MANOVAs
at each step showed no effects of condition for Step 1 (Mult F3,8=0.51), Step 4 (Mult
F3,8=1.74) or Step 5 (Mult F3,8=0.58, but significant effects for Step 2 (Multivariate
F3,8=10.8, p=.003, partial eta-sq=.802) and Step 3 (Multivariate F3,8=10.8, p=.003, partial
eta-sq=.802). Two-tailed paired t tests showed that all three measures differed for Step 2
and Step 3.
Figure 5 about here
Discussion
This experiment provides a close replication of the previous findings, despite the
changes in typeface. The advantage of the collocated interface is the same on steps 2 and 3
(when the place name changes from a sans-serif normal to a serif italic face), but declines
on step 4 to become non-significant (when the place name changes between two italic
faces). In all three experiments, the initial and final steps do not differ. The initial step, of
course, is in fact the same for both interfaces, because the place name has not yet been
found and so there is nothing to collocate. On the final step in Experiment 1, the target was
a symbol rather than the place name, and was not collocated with the point clicked and so
the interfaces are again equivalent. However, in Experiments 2 and 3 the last four task steps
all require participants to click on the place name, and so there is no obvious reason why
the advantage of the collocated interface should disappear so consistently in the last step.
Collocating interface objects
19
One possibility is that the low level of detail on these final maps makes it easy to
retopicalise on the place name, so that the searching required by the translated interface
becomes trivial.
Overall, this series of three experiments has shown that the advantage of collocating
maps over the conventional approach of translating them to the centre is robust, supporting
our argument that the design principle derived from the analysis of film cutting techniques
can be used to improve interface design. We now need to show that the advantage of
collocating is not generic, but depends upon the task being performed. There is some
indication in the Maps tasks that collocation does not always improve performance, when
the search is simple in the final step, for example. May, Dean and Barnard (2003) found
that collocation was not used in film editing in situations where the location of the topic of
interest was predictable and when the occurrence of the cut could be anticipated, as when
two actors were talking to each other. In such conversational scenes, cuts would alternate
between views of the two actors faces, on alternate sides of the screen. The resulting
sequence resembled the view of the actors one would have if standing next to them while
they were talking, moving ones gaze between them, prompted by the dialogue. Another
class of cut in which collocation did not occur were ‘over the shoulder’ cuts, where an actor
would be seen doing something before a cut, and the camera would then show the result of
their action from behind them, with the rear of their head and a shoulder occupying a corner
of the frame. Such framing of the action serves to provide a context, linking the results of
the action to the person responsible, and also allowing one to see it from their point of
view.
Extrapolating to computer interfaces, a similar lack of benefit from collocation should
be found when context is important, or when translation of focal point after a screen change
Collocating interface objects
20
is predictable. Earlier in this paper, we suggested that a scenario that might benefit from the
retention of contextual information would be searching through a hierarchical database,
where one is successively refining the category within which one is searching. Instead of
replacing the initial set of categories with a second, finer-grained subset, a translated
interface would place the new subset off to one side, with the original selection still visible
so that the result of the interaction is contextualised by the super-ordinate category name.
This form of interface is in fact used in Apple’s iTunes ™software, in which three panes
arranged horizontally contain genre, artist and album titles. When nothing has been
selected, each pane shows all of the information in the Library. Clicking upon a genre in
the leftmost pane refines the content of the other two panes so that only artists and albums
within that genre are now listed. Clicking again upon an artist further refines the album list
to only those albums by that artist within that genre.
In the final two experiments in this paper, we contrast databases that work in a similar
manner to this, with the new information appearing alongside previous windows of
information, with collocated designs where the new information replaces the previous
window. If the collocation principle supported in the Maps interface were working simply
because it minimised eye movements, then the collocated versions of the databases should
similarly benefit, because the new information is physically located close to the previous
focus of attention. If, however, collocation worked in the Maps interfaces because it was
concordant with the users’ task, then it should not be as useful here, because the task
involves successively moving through a data structure in a predictable manner, with detail
unfolding on each operation. In our next study, we contrast a translated design in which the
new window opens to the right of the previous window, with a collocated design in which
the new window opens where the old window was, and the old window appears to the right.
Collocating interface objects
21
The collocated design thus removes the sense of moving successively through the data
while retaining the availability of contextual information.
Experiment 4
Method
The same 26 participants who completed Experiment 1 also took part in this
experiment, in the same session. Half of them completed this experiment before
Experiment 1, and half after it. They were told that they would be searching two databases.
In one database, their task was to search for a specific car part, and in the other they would
be ordering a specific musical recording. The same equipment as in previous experiments
was used to conduct this experiment.
In each task, participants read an instruction message asking them either to order an
item, e.g., ‘Please order part number: CC05 for model: Seicento for make: Fiat’ or Please
order album: 05AT for artist: Nina Simone from Genre: Jazz’. On clicking the OK button,
they were presented with a list of ten car makers or musical genres and had to locate the
appropriate one, click it, and then click an OK button in the lower right of the window.
This then opened a second list in another window, containing the ten second-order items
(car models or musical artists), with a Back button and an OK button. Clicking the Back
button dismissed this window and returned participants to the previous step, intended to
allow participants to continue a trial having made an error on the previous step. On
selecting the appropriate item and clicking the OK button, the third and final window was
opened, displaying ten alphanumerical codes, again with Back and OK buttons. After
selecting an item and clicking OK, the trial ended with either a confirmation of correct
Collocating interface objects
22
ordering or an error message. The next trial began when this message was dismissed by
clicking OK.
The three list windows each measured 120 mm wide by 100 mm high. The first
window was always located in the left third of the screen (which was 405mm wide),
centred vertically. The list items were displayed in Helvetica font, with capitals 4mm high.
In the translated interface, the second window was positioned in the middle third of the
screen, and the third window in the right third of the screen. In the collocated interface,
each window opened in the left third of the screen, with the previous windows(s) moving to
the right (see Figure 6), so that on the final step they were in the reverse order to the
translated interface.
Figure 6 about here
The order of interface design was balanced over participants, and the task content
(music or car parts) was balanced within interface design. Ten trials were run with each
task content, with the items chosen so that each position within the list was used once in
each window. The order of the ten trials was rotated in a Latin square so that after ten
participants each trial had been presented in each serial position within the task.
Results
Reliable eye-tracking data was not obtained for five participants. Four of these had
also been excluded from the analyses in Experiment 1, and one additional female (aged
27:2) was excluded. The remaining 21 participants completed a total of 420 trials with three
task steps in each trial, of which eye-tracking data was rejected for 89 task steps overall
(7%).
Collocating interface objects
23
As in the previous experiments, the mean number of fixations, path-length and
duration were computed. Because the first task step was identical in the two conditions,
these data were not analysed, giving two task steps in each two conditions. Each step
consisted of selecting an item from the list and then clicking OK, so two parallel sets of
data were obtained and these were analysed in a single MANOVA as six measures (Figure
7). There were significant main effects of condition (Pillai’s Trace F6,15=4.14, p=.012,
partial eta sq=.62), task step (Pillai’s trace F6,15=8.15, p<.001, partial eta sq=.77) and an
interaction (Pillai’s trace F6,15=3.42, p=.025, partial eta sq=.58).
Figure 7 about here
For the item selection measures, Univariate Fs showed effects of task step upon
duration (F1,20=12.8, p=.002, partial eta sq=.39, MSE=.533), with the third step taking
longer, and an interaction for fixations (F1,20=6.10, p=.010, partial eta sq=.29, MSE=.745),
with the translate interface requiring more fixations on the third step. For the OK Clicks,
there were effects of condition upon duration (F1,20=8.63, p=.008, partial eta sq=.30,
MSE=.067), with the collocated condition taking longer; of task step upon duration
(F1,20=8.99, p=.007, partial eta sq=.31, MSE=.057), with the click being completed faster;
and interactions for path length (F1,20=8.01, p=.010, partial eta sq=.29, MSE=.004), with
the colocated interface being longer for the third step; and duration (F1,20=6.59, p=.018,
partial eta sq=.25, MSE=.066) with the translated interface being faster for the second step.
We had collected separate sets of data for the item selection and OK clicks because
once an item has been selected, the OK button is in the same relative position for both
interfaces, and so we had not anticipated there being any effects for these measures.
Because there were effects, we recombined the item selection and OK click measures to
Collocating interface objects
24
produce overall measures reflecting performance on both steps. Now the MANOVA
produced a main effect only for condition (Mult F3,18=3.92, p=.026, partial eta sq=.395),
with task step (Mult F3,18=2.49, p=.093, partial eta sq=.293) and their interaction (Mult
F3,18=2.72, p=.075, partial eta sq=.312) falling short of significance. The Univariate Fs
showed no effects of condition for any of the measures, with an effect of task step upon
duration (F1,20=5.02, p=.037, partial eta sq=.201, MSE=), and interactions for fixations
(F1,20=8.95, p=.007, partial eta sq=.309, MSE=1.04) and path length (F1,20=4.50, p=.047,
partial eta sq=.183, MSE=.041), with a marginal effect for duration (F1,20=4.15, p=.055,
partial eta sq=.172, MSE=.541). The means indicate that that these interactions reflect an
advantage for the translated interface on the second step, but for the collocated interface on
the third step.
Discussion
Unlike the clear advantage for the collocated interface found in all three experiments
using the map task, there is no real superiority for either interface here. In terms of
selecting an item from the list, the interfaces are equivalent on the second step, and
collocate is better on the final step; but clicking the OK button is easier for the translated
interface on the second task step. The OK click data is surprising, because we had not
anticipated there being any differences between the interfaces. Once the item in the list has
been selected, the OK button is in the same relative position within the window for both
interfaces, and so there should be no difference in the number of fixations needed to direct
gaze towards it – in fact, the collocated interface might benefit here, because the OK button
is also in the same physical screen location, and could potentially be clicked without
needing to be looked at. The same considerations apply to the distances needed to move the
mouse between clicks – for the collocated interface the mouse must be moved down and
right from the selection to the OK button, and then back up and left to make the next
Collocating interface objects
25
selection. In the translated interface, the same movement down and right must be made to
click the OK button, and the mouse must then move up and right to make the next
selection. The overall distance is equivalent. Nevertheless, the translated interface comes
out better on these measures.
There were a number of peculiarities in this task that might have made it more
difficult for participants than we had intended and which thus limit the generalization of the
results, especially since we are arguing that the data shows no clear advantage for either
interface. Participants reported that the meaningless alphanumeric code for the third task
step was particularly difficult to recall, especially since the task instructions presented the
items in the reverse sequence to that required by the interface. The presence of the Back
button on steps two and three was in retrospect unnecessary and while made the task more
like a real interface led to in increase in errors. Most obviously, though, the reappearance of
completed windows to the right of the new window in the collocated interface was
unrealistic. While the two designs both displayed contextual information in the form of
previously selected information, a real collocated design would simply replace the old
window with the new one rather than moving the old window elsewhere (the multiple
levels of dialogue boxes in many Microsoft applications, for example, are generally
superimposed upon each other, leading to a cascade of Apply and OK clicks being required
to complete the instruction). These issues were addressed in the final experiment.
Experiment 5
Method
Twenty first year undergraduates studying Psychology at the University of Plymouth
took part in this study, in return for participation points to be used to reward participants in
Collocating interface objects
26
their own experiments. This experiment replicated the design and apparatus of Experiment
Four, with changes to the phrasing of the instructions, the content of the final items, the
ordering within the lists, treatment of errors, and the location of the screen in the collocated
interface. The meaningless alphanumeric album codes and car parts were replaced with real
titles or car parts, and the instructions were reordered so that the items were in the same
order as the interface, e.g., ‘please order Fiat - Seicento – fuel filter’ or ‘Please order Jazz -
Nina Simone - Ain’t got no..’. Within the lists, the items were no longer alphabetically
ordered, to avoid participants anticipating a likely location. The Back button was no longer
displayed, and if OK was clicked with an incorrect item selected at any point, an error
message was displayed and the trial was repeated from the instruction screen.
The translate interface window positions were the same as in Experiment Four, with
the focal window moving from left to right across the screen leaving the prior windows in
their original positions to its left. In the collocated interface all windows were now located
in the centre of the screen, replacing the previous window, so only the current list was
visible at any time.
Results
Complete eye-tracking data could not be obtained from four participants. The
remaining sixteen included 5 males and 11 females, and were aged between 20 y 10 m and
62 y 0 m (median 23 y 0 m).
As in Experiment Four, the mean number of fixations, path-length and duration were
computed for item selection and OK click on the second and third task steps of the two
interfaces, and these six measures were entered into a MANOVA with condition and task
step as factors (Figure 8). There were again main effects of condition (Pillai’s trace
Collocating interface objects
27
F6,10=5.21, p=.011, partial eta sq=.76), task step (Pillai’s trace F6,10=5.16, p=.012, partial
eta sq=.76) and an interaction (Pillai’s trace F6,10=4.33, p=.021, partial eta sq=.72).
Figure 8 about here
For the item selection measures, Univariate Fs showed that there was an effect of
condition for path length (F1,15=15.2, p=.001, partial eta sq=.504, MSE=.046) with the
translate interface requiring longer paths; of task step for fixations (F1,15=23.1, p<.001,
partial eta sq=.606, MSE=1.344), path length (F1,15=13.6, p=.002, partial eta sq=.476,
MSE=.047) and duration (F1,15=20.2, p<.001, partial eta sq=.574, MSE=.155), with the
third step requiring more fixations, a longer path and taking longer to complete, and an
interaction for path length (F1,15=13.3, p=.002, partial eta sq=.470, MSE=.019), with the
translated interface requiring a longer path length for the third step. For the OK clicks,
Univariate Fs showed that there was an effect of condition for fixations (F1,15=17.7,
p=.001, partial eta sq=.542, MSE=.351), with the translated interface requiring fewer; there
were no effects of task step and no interactions.
When the item selection and OK click measures were combined to produce overall
measures for the two steps, there were again main effects of condition (Pillai’s trace
F3,13=4.92, p=.017, partial eta sq=.53), task step (Pillai’s trace F3,13=8.29, p=.002, partial
eta sq=.66) and an interaction (Pillai’s trace F3,13=10.0, p=.001, partial eta sq=.70). The
Univariate Fs indicated that there were effects of condition upon path length (F1,15=11.1,
p=.005, partial eta sq=.43, MSE=.036) and marginally upon duration (F1,15=4.40, p=.053,
partial eta sq=.23, MSE=.283); of task step upon fixations (F1,15=15.7, p=.001, partial eta
sq=.51, MSE=1.84) and duration (F1,15=21.5, p<.001, partial eta sq=.59, MSE=0.18); with
an interaction for path length (F1,15=17.9, p=.001, partial eta sq=.54, MSE=0.022). The
Collocating interface objects
28
means indicate that the interaction reflects an advantage for the collocated interface on the
final task step.
Discussion
With the revised and more realistic interface, in which the collocated windows
replace each other, and the instructions present the information in the same order as the task
requires it to be used, the two interfaces are indistinguishable for the second task step, but
the collocated interface now has an advantage over the translated interface in terms of eye-
movement path length and duration for the final task step. Overall, combining both item
selection and OK clicking, there is no difference in the number of fixations required. As in
Experiment Four, there is a benefit for the translated interface in using the OK button, this
time in number of fixations.
The advantage for collocation on the final step is a consequence of the translated
interface requiring a longer path and duration here, compared to the previous step. There is
no obvious reason why the translated interface should perform worse in the final step
(moving from the artist to the song) than it did on the second step (moving from the genre
to the artist). Both interfaces show a rise in fixations for the item selection element of these
task steps, but only in the translate interface does this turn into a longer path and duration.
However, this is the same pattern as observed in the previous experiment, and so may be an
example of consistency of location improving performance within the trial, if not over
trials.
General Discussion
Our first three experiments showed a consistent benefit for collocating the point of
interaction with its result in a map interface; our final two have shown little consistent
Collocating interface objects
29
benefit for collocation in a list-style database, although there is an indication that it may
lead to faster performance and shorter search paths on the last of three steps.
In summary, this is much what May, Dean and Barnard (2003) predicted, but which
the designers of online maps have not yet implemented. While it would be nice if we found
the conventional design of maps to change as a result of this work, we would also be happy
if it leads to a better understanding of how principles and theory can be used to guide
design decisions in interface design. In previous papers we have followed a path from
anecdotal observation (that film editors seem to control the position of elements within the
frame) to a cognitive model of film watching (in which any demand for visual search
interrupts the comprehension of narrative), collected empirical data to support the model (in
terms of eye movements following cuts in a commercial film), and applied it to interface
design (in the form of contextualised but generic principles such as collocation). Here we
have shown that this principle does what our model says it will, making it easier for a map
user to understand what they are looking at when they click to zoom-in to a map. We have
also shown that it does not seem to benefit the more verbal, context-laden task of using a
database, although here there is some evidence of another principle, that of consistency of
operation, coming into play. Making this path of inference and deduction apparent provides
a rationale for a design decision, that allows the designer to decide when a principle really
applies to their design, or whether other specific issues mean that another, more relevant
but contradictory principle might come into play. Without the rationale, there is no support
for the designer to make a reasoned choice amongst the guidance they are offered.
In the case of maps, it is not as if the designers are unaware of the confusion caused
by the unfilmic, move-to-centre algorithm. Most online maps now try to guide the user’s
gaze to the new, central location by making use of a form of animation, interpolating four
Collocating interface objects
30
or five views of the initial map progressively moving a little each frame so that the clicked
point moves toward the centre of the screen, before then displaying the new map. Whether
intentionally or not, they are following the suggestion of Mackinlay, Card and Robertson
(1990) that the user should be supported in relocating points of view by moving them
through virtual space rather than jumping directly from point to point. They are making
their interfaces resemble fictional interfaces such as the 3d hologram viewer used by
Deckard in Ridley Scott’s (1982) film ‘Blade Runner’. This is not how film editors create
their own material, however, because the experience of watching uncut footage shot while
the cameraman is moving through a scene is not only time consuming but often nauseating,
when the viewer’s bodily and visual sensations do not correlate. Cutting directly between
two different view-points can work, if collocation is used, as film makers do know, and as
interface designers should know.
The collocation principle applies to more than map interfaces. It should also work
whenever the user wants to move directly from one object to another, as yet unrepresented,
object, and where the previous view is now superfluous. Clicking on an entry in a table of
contents in a word-processor can allow you to jump to that heading in the body of the text,
but the heading is rarely located on-screen in the same position as was the corresponding
entry in the table of contents, and must be searched for. Our simple database began to show
benefits of collocation, even though we had not expected it to, perhaps because there was
not sufficient need for the contextual information remaining from previous task steps.
Increasing memory load, or ambiguity of material, as in our Experiment Four, makes the
contextual information more useful and collocation less beneficial. These predictions and
post hoc explanations are not plucked from thin air but based upon the underlying cognitive
model of film watching in May, Dean and Barnard (2003) and its contrast between the
Collocating interface objects
31
propositional-implication loop required for narrative comprehension and the propositional-
object loop required for visual search.
From a psychological point of view, the model of interface use and the ICS account
of cognition can be used to reason about the cognitive consequences of changes to the task,
such as the use of verbal material of increasing ambiguity (increasing demand upon a
propositional-morphonolexical loop). Without the link between the principle and its
theoretical derivation, such extrapolation and extension would be impossible and the
guidance to designers would become rapidly outdated, as new devices are developed, and
novel tasks digitized. It also provides a link between the applied domain of human-
computer interaction and computational models of visual search, (see, for example,
Zelinsky, 2008), where top-down models in which search is driven by object based
information are simulating human behaviour with increasing accuracy.
References
Barnard, P. J. (1985). Interacting cognitive subsystems : A psycholinguistic approach to
short-term memory. In A. Ellis, (ed.), Progress in the Psychology of Language, Vol.
2, 197-258. Erlbaum: London,.
Barnard, P. J., May, J., & Green, A. J. K. (1991). Preliminaries for the Application of
Approximate Cognitive Modeling to Design Scenarios. Esprit BRA3066 Amodeus
working paper RP4/WP11. Cambridge, UK: Medical Research Council Applied
Psychology Unit.
Kraft, R. N. (1986). The role of cutting in the evaluation and retention of film. Journal of
Experimental Psychology: Learning Memory and Cognition, 12(1), 155-163.
Collocating interface objects
32
Kraft, R. N. (1987). Rules and strategies of visual narratives. Perceptual and Motor Skills,
64, 3-14.
Mackinlay, J.D., Card., S.K., & Robertson, G.G. (1990) ‘Rapid controlled movement
through a virtual 3d workspace’, In Proceedings of Siggraph’90, 171-176
May, J., & Barnard, P. J. (1995). Cinematography and interface design. In K. Nordby, P.
Helmersen, D. J. Gilmore, & S. Arnesen (Eds.), Human-computer interaction:
Interact'95 (pp. 26-31): Chapman and Hall.
May, J., Dean, M.P. & Barnard, P.J. (2003) Using film cutting techniques in interface
design. Human-Computer Interaction, 18, 325-372.
Moellering, H. (1975) At the interface of cartography and computer graphics. Proceedings
of the 2nd annual conference on computer graphics and interactive techniques,
Bowling Green Ohio, June 25-27. pp.256-259. ACM Press.
Scott, R. (Director). (1982) Blade Runner [MotionPicture]. United States: Warner Brothers.
Young, E., & Clanton, C. (1993). Film craft in user interface design. Tutorial presented at
the InterCHI’93 conference, Amsterdam.
Zelinksy, G. J. (2008) Eye movements during target acquisition. Psychological Review,
115(4), 787-835.
Collocating interface objects
33
Tables and Figures
Table 1: The typefaces used in experiment three were chosen so that they were more
consistent than those in experiment two, were more like the faces used for real place names
on the OS maps, and the differences between successive maps were equivalent.
Figure 1: In the collocate condition (upper row) the geographical detail clicked on is
positioned under the cursor on the new, higher scale map. In the translate condition (bottom
row), the clicked detail is moved to the centre of the new map. In both conditions, the task
is to find a place name in the first map (the Splott area of Cardiff), relocate it in the second
map, and select a diamond shaped target symbol in the third map. The simulated arrow
cursors indicate the point that is about to be clicked, and the lines show the optimum
movement from the previous click. Note that the location of the place name relative to the
street layout varies from map to map. Maps (c) Crown Copyright/database right 2008. An
Ordnance Survey/EDINA supplied service.
Figure 2: The two interfaces performed identically on the find and target steps, but
the translate interface (bold line) required more gaze fixations, a longer search path, and
took longer to complete on the critical relocate step, than did the collocated interface
(narrow line). Bars indicate one standard error.
Figure 3: In the second experiment, participants had to locate a place name that had
been added to five increasingly detailed maps such that its appearance and geographical
position varied from map to map, as real place names do on different scale maps of the
same region. In this example, the task is to locate Bowsden using the collocated interface.
Maps (c) Crown Copyright/database right 2008. An Ordnance Survey/EDINA supplied
service.
Collocating interface objects
34
Figure 4: The Translate method (bold line) required a longer path length, more
fixations, and took more time than the Collocate method (thin line) for steps 2, 3 and 4 of
the map task. The initial steps (locating the number on the clock face) and the final steps
did not differ (bars indicate one standard error).
Figure 5: Changing the typefaces of the place names shown in experiment three did
not remove the advantage of the collocated interface (thin line) over the translated (bold
line) interface for the intermediate steps of the task (bars indicate one standard error).
Figure 6: Three successive screens from an interaction with the car-parts database in
experiment four, with the translated interface on the left and the collocated interface on the
right. The participant has been asked to ‘Please order part: CC05, for model: Seicento, for
make: Fiat’.
Figure 7: In experiment four, the translated interface (bold lines) is better on the
second task step but the collocated interface (narrow lines) is better on the third step. Step
totals shown by solid lines with circles; item selection solid lines without markers; OK
clicks dashed lines; bars indicate standard errors.
Figure 8: In experiment five, on the second task step the interfaces are
indistinguishable, but on the third step the translated interface (bold lines) requires a longer
path-length than the collocated interface (narrow lines); step totals shown by lines with
circles; item selection solid lines without markers; OK clicks dashed lines; bars indicate
standard errors.
Collocating interface objects
35
Level Experiment 2 Experiment 3
Step2 Helvetica Bold Times Normal
Step 3 Times Normal Helvetica Italic
Step 4 Georgia Normal Times Italic
Step 5 Palatino Italic Helvetica Normal
Table 1:
Fig
ure
1:
Coll
oca
ting i
nte
rfac
e obje
cts
37
Fig
ure
2
Coll
oca
ting i
nte
rfac
e obje
cts
38
Fig
ure
3:
Coll
oca
ting i
nte
rfac
e obje
cts
39
Fig
ure
4:
Coll
oca
ting i
nte
rfac
e obje
cts
40
Fig
ure
5:
Coll
oca
ting i
nte
rfac
e obje
cts
41
Fig
ure
6:
Coll
oca
ting i
nte
rfac
e obje
cts
42
Fig
ure
7
Coll
oca
ting i
nte
rfac
e obje
cts
43
Fig
ure
8: