readability of text ar.pdf
TRANSCRIPT
-
8/9/2019 readability of text AR.pdf
1/21
Michele Fiorentino*
Saverio Debernardis
Antonio E. Uva
Giuseppe Monno
Dipartimento di Meccanica,
Matematica e Management
(DMMM)
Politecnico di Bari
70126 Bari, Italy
Augmented Reality Text StyleReadability with See-ThroughHead-Mounted Displays inIndustrial Context
Abstract
The application of augmented reality in industrial environments requires an effective
visualization of text on a see-through head-mounted display (HMD). The main contri-
bution of this work is an empirical study of text styles as viewed through a monocular
optical see-through display on three real workshop backgrounds, examining four colors
and four different text styles. We ran 2,520 test trials with 14 participants using a
mixed design and evaluated completion time and error rates. We found that both pre-
sentation mode and background influence the readability of text, but there is no inter-
action effect between these two variables. Another interesting aspect is that the pre-
sentation mode differentially influences completion time and error rate. The present
study allows us to draw some guidelines for an effective use of AR text visualization in
industrial environments. We suggest maximum contrast when reading time is impor-
tant, and the use of colors to reduce errors. We also recommend a colored billboard
with transparent text where colors have a specific meaning.
1 Introduction
A valuable application of augmented reality (AR) in an industrial contextis to superimpose technical information on the real world. The main advantage
of this approach when compared to paper/screen-based documentation is that
the added graphics is co-located and visualized in real time. This feature is very
useful to support complex maintenance or assembly processes where most of
personnel time is spent retrieving technical task instructions, localizing parts,
and operating on them in the right order (Uva, Cristiano, Fiorentino, &
Monno, 2010). In this case, a solution could be offered by head-mounted dis-
plays (HMDs) that allow task instructions to be superimposed on the real-world
view of the operator. Two main technologies are available for HMD: (1) video
and (2) optical see-through; these technologies have different trade-offs as
described by van Krevelen and Poelman (2010). An optical see-through HMDcould be the ideal candidate for industrial use, because of the real environment
awareness and ergonomics. However, optical see-through systems require a
comprehensive study of the visual perception: color and brightness of the real
environment visually conflicts with the color and/or contrast of the superim-Presence,Vol. 22, No. 2, Spring 2013, 171190
doi:10.1162/PRES_a_00146
2013 by the Massachusetts Institute of Technology *Correspondence to [email protected].
Fiorentino et al. 171
-
8/9/2019 readability of text AR.pdf
2/21
posed graphical elements (see Figure 1). The main prob-
lem of the current technology is that only bright objects
can overlap on the background. In practice, dark colors
appear as semi-transparent and they mix in with the
background. This makes the use of see-through HMD
very challenging, especially in outdoor environments,
where the brightness of the background overcomes the
brightness of the display.
Industrial environments usually are indoor and charac-
terized by controlled lighting, as given by the standard
ISO 8995-1 (ISO, 2002), but visibility problems com-
monly arise in the readability of the technical text labels
and these limit the effectiveness of this technology.Literature on the readability of simple text in optical
see-through HMDs is scattered among different disci-
plines (computer graphics, humancomputer interac-
tion, etc.) and usually it addresses general problems
without satisfying the specific requirements and con-
straints of industrial workspaces (e.g., standard color
coding, industrial practice, and workshop backgrounds).
It is common practice in industry to follow standard or
personalized color coding rules. For example, the 5S,
one of the most popular workplace organization meth-
ods, suggests the use of colors in workspace to enforcesorting, straightening, systematic cleaning, standardiz-
ing, and sustaining (Hirano, 1995). A very common
practice of industrial data visualization which can be sup-
ported by AR technology is the use of shop floor paper
tags (see Figure 2). The tags carry important production
information in a cheap, simple, and effective way by text
and color coding. In an aerospace facility, red tags mean
defective products, green tags identify items to be
repaired, and yellow tags classify products that passed
quality control tests and are ready to be shipped out.
Another relevant example of color coding in industrial
practice is in piping. The standard colors for industrial
piping are given in ASME A13.1 (ASME, 2007) which
describes the content of the pipes, potential hazards, and
the direction of flow. Properly labeled pipes improve
safety and productivity by providing employees and
emergency responders with key information. An AR-
based visualization system can be very supportive by fil-
tering the technical database and displaying or labeling
only the pipes of interest to the user. A further industrial
reference is the standard ISO 3864 (ISO, 2011a), whichdefines safety colors and safety signs for graphical sym-
bols. It describes design principles for safety signs and
markings, product safely labels, graphical symbols, and
their colorimetric and photometric properties. Another
well-known color coding scheme in industry is the
OSHA safety color code for marking physical hazards
(29 CFR 1910.144; OSHA, 2007). This standard states
that red identifies fire protection equipment, emergency
stop devices, and containers holding dangerous materi-
als. Yellow indicates physical safety hazards, such as strik-
ing against, stumbling, falling, tripping, and so on.All these standards refer specifically to colors of
printed/painted supports, and not as visualized on digi-
tal displays. In order to fulfill color coding when passing
to an AR-based information system, a methodical
approach is needed. In fact, specific guidelines are
needed for AR devices where color perception could
Figure 1. Text and crosshair superimposed on the HMD during the
user tests.
Figure 2. Example of industrial color tag commonly used in manufac-
turing.
172 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
3/21
change, as opposed to printed signs; and as color percep-
tion varies, so would readability. Industrial applications
would benefit from these guidelines.
Previous work reports general optimization of visual-
ization, without providing color-based readability
guidelines. The main goal of the presented work is to
study the readability of textual information in indoor
industrial environments with an optical see-through
HMD. Different colors and text styles were combined
to investigate textual visualization on industrial back-
grounds. For this purpose, we developed an open-
source test workspace to support readability test experi-
ments and made it available to the academic community
(Fiorentino, 2012).
The paper is organized as follows. In Section 2, we
present previous literature, followed by the description
of our approach in Section 3. In Sections 4 and 5, we
present the design of experiments, the results, and a
related discussion. Finally, we present a conclusion and
future work in Section 6.
2 Related Work
The readability of text is strictly related to aspects
of human cognition and perception. In particular,
human beings are sensitive to the contrast between text
and the background on which text is superimposed(Legge, Parish, Leubker, & Wurm, 1990). In fact, the
International Standards Organization ISO 9241 stand-
ard-3 (ISO, 1993) recommends a minimum luminance
ratio of 3:1 and a preferred value of 10:1. Text readabil-
ity is a complex problem and involves different sciences
(e.g., cognitive research, psycholinguistics, and human
factors). Physiological and psychological effects influ-
ence text readability on displays, as demonstrated by
Fukuzimi, Yamazaki, Kamijo, and Hayashi (1998). They
studied the physical parameters that influence human
color perception on CRT displays: dominant wave-length, stimulus purity, and luminance. They analyzed
results from subjective evaluations combining: (1) some
dominant wavelengths, (2) stimulus purities, and (3)
luminance. They also studied the readability of colors
using an objective method using measurements from
electroencephalogram signals. Their results demon-
strated that an optimal stimulus purity exists in each
dominant wavelength, and further, that it is independ-
ent of luminance.
Harrison and Vicente (1996) explored text readability
in the design of transparent 2D menus superimposed
over different graphical user interface (GUI) background
content. They presented a novel anti-interference (AI)
font that uses luminance values to create a contrasting
outline. Their work includes an empirical evaluation of
the effect of varying transparency levels, visual interfer-
ence produced by different types of background content,
and the performance of AI fonts on text-menu selection
tasks. Testing demonstrated that the closer that the
shade and hue of the background is to the text color, the
higher is the interference, and the detriment of the
resulting performance. AI fonts produced a substantially
flatter performance curve, shifted toward better (i.e.,
faster) performance, especially at higher transparency lev-
els (i.e., over 50%), which is exactly the condition we
have in a see-through display.
A basic study on the perception of gray text on a non-
uniform gray background was presented by Petkov and
Westenberg (2003). They conducted psychophysical
experiments to demonstrate that the spatial frequency of
the patterns in the background has a relevant effect on
readability. The masking effect of the background is
higher when its characteristic pattern width is compara-ble to the letter stroke (or weight), while the letter size
shows no main effect. Their research can be a valid justi-
fication for the use of the outline style, and even more of
a billboard, which removes the background texturized
pattern around the text strokes. Nevertheless, this
research does not address color issues.
A more specific study on text readability for AR appli-
cations was presented by Leykin and Tuceryan (2004)
using a calibrated desktop CRT monitor at an approxi-
mate distance of 50 cm from the users head. They
implemented seven real-time supervised classifiers,trained them with user data, and evaluated whether text
placed on a particular background was readable or not
on the screen. They concluded that textured background
variations affect readability only when the text contrast is
low. Their study considered only the luminance informa-
tion of grayscale images and not different color.
Fiorentino et al. 173
-
8/9/2019 readability of text AR.pdf
4/21
An interesting study of augmented reality viewability
in outdoor environments was conducted by Gabbard,
Swan, and Hix (2006). They evaluated text legibility
using an optical see-through display and different text
styles superimposed on matte-finished printed poster
backgrounds (40 in 60 in). They used six text drawing
styles, three static and three active (meaning that the text
color changed depending upon the presented back-
ground poster), six backgrounds (pavement, granite, red
brick, sidewalk, foliage, and sky), and three distances
(1 m, 2 m, and 4 m from the user). Their approach pre-
sented three different active algorithms to determine the
best color to use: Complement, Maximum HSV Com-
plement, and Maximum Brightness Contrast. They
chose blue text to replace black text, which is impossible
to produce on see-through displays. Their most impor-
tant finding was the empirical evidence that user per-
formance is significantly affected by background texture,
text drawing style, and their interaction. The billboard
drawing style (blue text on pure white), and green text,
provided the fastest performance. Visually complex back-
ground textures performed very well (red brick) and
intermediately well (foliage), contradicting the initial
hypothesis that a complex background must reduce per-
formance. Surprisingly, the active text drawing styles did
not perform better than the static styles in the practical
tests. Their final guidelines suggested the use of fullysaturated green labels, and the avoidance of fully satu-
rated red labels. An important aspect for our research is
that the error rate was very small (1.9%), and they did
not analyze it further.
Tanaka, Kishino, Miyamae, Terada, and Nishio
(2008) proposed an unusual approach to address optical
see-through HMD limitations by using a fixed camera
mounted on the visor and directed forward. The camera
faced two mirrors that separated the left and right view.
Their approach was based on using the camera to evalu-
ate the peripheral visibility of the user periphery and asuggestion to turn the head whenever this would lead to
better conditions. Their visibility model considered: (1)
the average of RGB and HSV color spaces, (2) the var-
iances in RGB, YCbCr, and HSV color spaces, (3) how
information was tied to a precise area, and (4) which
movements were possible. However, their layout strat-
egy expressly did not preserve the registration of the dig-
ital information on the real objects.
Jankowski, Samp, Irzynska, Jozwowicz, and Decker
(2010) explored the effects of varying four text drawing
styles (plain, billboard, anti-interference, and shadow),
image polarity (positive when dark characters are on a
light-colored panel and conversely for negative), and
two backgrounds: the first one with videos recorded in
urban and outdoor environments and the second one
recorded in 3D video games. They found out that there
was little difference in reading performance for video and
3D backgrounds. Furthermore, they concluded that
negative presentation is faster and more accurate than
positive presentation. Therefore, billboard styles resulted
in the easiest to read and the most immune to back-
ground distractions.
From the presented works, we can conclude that the
knowledge on the readability of text on HMD is scat-
tered among different disciplines, application fields, and
hardware setups, and, at the moment, it is not adequate
to provide standard and reliable guidelines for the appli-
cation developers. In particular, we found no previous
work addressing the specific industrial environments.
This study draws inspiration from Gabbards experi-
ments that address mainly outdoor environments and
textures. Our idea is to apply a similar approach to indus-
trial context. Therefore, the motivation of this work is tostudy and find effective text styles for monocular optical
see-through presentation, specifically for the industrial
context.
3 Our Approach
We used a mixed-design approach to examine user
performance in a text-identification task similar to Gab-
bards test (Gabbard, Swan, Hix, Kim, & Fitch, 2007).
Gabbard considered text as one of the most fundamental
graphical elements in any user interface and therefore theidentification task is text-based (as opposed to icon-,
line-, or bitmap-based).
Because of the limited previous work on displaying in-
formation in AR in an industrial context, we wanted to
focus on text readability in real workshop scenarios. Spe-
cifically, we designed an experiment that abstracted the
174 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
5/21
short reading tasks that are very common in technical
AR applications in industry. For this study, we used a
low-level identification and visual search task, since we
did not want to address the semantics (e.g., cognitively
understanding the contents/meaning of the text). We
simply evaluated whether or not users could quickly and
accurately read information (i.e., text legibility), asking
the user to perform the following tasks.
Scan a meaningless short random text string.
Identify a target letter.
Count letters.
Provide a response.
The user is asked to perform these tasks in different
presentation modes, obtained with text styles and colors
used to convey mandatory information for the aforemen-
tioned industrial motivations. In this work, we limited
text styles to four types: (1) simple text, (2) text with
outline, (3) text with billboard, and (4) text with outline
and billboard. The text, outline, and billboard can all be
of different colors. The combination of text styles and
colors creates the presentation modes used for our
experiment, which will be detailed in Sections 4 and 5.
We also want to make it clear that the experiment task is
a foreground-only activity, and we did not measure any-
thing about the users awareness of background content
or changes (i.e., this was not a divided attention task).In the initial stage, we ran preliminary tests needed to
detect the most significant parameters to be used as the
independent variables of our experiments. We used an
optical see-through HMD, the Liteye LE 750A, 800
600 OLED display. However, the parameters involved
in the visualization of the text (text font, color, size,
position, etc.) are too numerous for extensive user tests.
For this reason, we developed a specific software tool,
called HMD test, written in C using the Qt library,
which has two main functions: editing the parameters
(editor mode) and running the user tests (player mode).In the editing phase, the user can interactively change
all the parameters of the text visualization with a simple
GUI and preview the final effect in real time on the mon-
itor (see Figure 3).
If the HMD is connected to the second video port,
the user can preview the visualization directly; otherwise
he or she can simulate the result by loading a back-
ground image on a desktop screen.
The user can change and test the following text style
parameters: font size, text color and transparency, text
billboard color and transparency, outline width, and out-
line color and transparency. In our preliminary phase, we
simulated different configurations using a library of 100
pictures downloaded from the internet using Google
Images with specific keywords (i.e., workshop, shop
floor, manufacturing, etc.; see Figure 4). In this way, we
were able to evaluate the experimental settings and to
plan the user tests.
The HMD testbed automated the execution of tests in
player mode: the test configurations are retrieved from
the test template file, shuffled randomly, and then dis-
played on the HMD. The application acquires and ar-
chives the following data in a simple log file: subject
username, time and date of the test, displayed text
strings, text style, users answer, and response time.
During the test, the performance, the progress bar,
the current score, and the top score are displayed on the
service desktop screen to monitor the test (see Figure
5). The score is added to motivate the user to maximize
performances during the test; since it is not visible to
the participants, it cannot influence the test results. The
software is publicly available on our website
(Fiorentino, 2012). We are interested in comparing theresults from other researchers using different display
configurations.
4 Design of Experiment
We differentiate our experiments from Gabbards
tests by the usage of real industrial backgrounds. Our
software displays two different text blocks on the HMD
view area. The upper text block is composed of three
randomly generated strings with alternating uppercase
and lowercase letters, while the lower block consists ofthree strings of capital letters. In the upper block, one of
the three sets of letter pairs consists of the same letter,
given once as uppercase and once as lowercase (e.g.,
mM, Pp). This is the target letter. Each user has to iden-
tify the target letter and he or she has to count out how
many times it appears in the lower block. The partici-
Fiorentino et al. 175
-
8/9/2019 readability of text AR.pdf
6/21
pants should input the result on a provided numeric key-
pad. The possible answers are 1, 2, 3, or 0 in the case of
unreadable letter not found. The alphabet is restricted to
the following letters: C, K, M, O, P, S, U, V, W, X, Z.
These letters have graphical similarity in uppercase and
lowercase, therefore this restriction makes the difficulty
associated with the target identification uniform. Our
software generates and visualizes the text blocks on theHMD and records response time and user errors. A
crosshair viewfinder is displayed, and the user must point
to a specific target in the real scene (see Figure 1). This
solution avoids the chance that users may turn the HMD
to a more favorable position (i.e., to choose a specific
background point).
4.1 Measures
We focused on the following experiment inde-
pendent variables (see Table 1) and the dependent varia-
bles (see Table 2) that we collected for the subsequent
statistical analysis.
Apart from measuring efficiency (completion time)
and effectiveness (error rate), a 5-point Likert scale is
used to measure user preferences with a post-experiment
questionnaire.
4.2 Backgrounds
Most of the industrial backgrounds we have
encountered, especially those related to production
Figure 3. HMD testbed in editor mode: user can design the text style and preview it on the screen or HMD.
176 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
7/21
facilities, present some common characteristics. They areindoors, uniformly lit, quite dirty, and mainly gray in
color. They often present sparse saturated colors (e.g.,
tools, signs, etc.) We used three real-world backgrounds:
(1) testbed frame, (2) tool workbench, and (3) motor-
bike engine (see Figure 6).
We chose the backgrounds with the intention to pro-
vide three different luminance profiles: negative
(testbed), positive (tool workbench), and neutral
(engine). We took pictures from the users point of view
with a digital camera and we display in Figure 7 the
related histograms. The test was carried out with the usersitting on a swivel chair in order to have the same head
position for all participants of about 50 cm in height and
60 cm in depth from the background center point (see
Figure 8). All tests were performed in a laboratory with
fully shaded windows and artificial lighting (overhead
fluorescent lights). We measured the illuminance value
with a lux meter and we registered an average illumina-tion over the work area of about 300 lux.
4.3 Colors
Our setup required users to be concentrated for
the whole task. Studies on the length of human sustained
attention reported a maximum of around 20 min for
adults (Cornish & Dukette, 2009). To keep the experi-
ment time within 20 min, we limited the color range to
only four options. In particular, following the specifica-
tions defined by the ISO 3864 standard (colors for safetysigns; ISO, 2011a), we decided to use the red and the
green as safety colors (i.e., colors with special properties
to which a safety meaning is attributed), and white and
black as contrast colors. Apart from their general mes-
sages (safety and prohibition), green and red are worth a
deeper investigation in AR setup because the literature
Figure 4. A sample of the images used in the preliminary design phase.
Fiorentino et al. 177
-
8/9/2019 readability of text AR.pdf
8/21
reports green as one of the best colors for reading, and
red among the worst performing colors on CRT displays
(Fukuzimi et al., 1998). Colors are displayed on our
uncalibrated optical see-through HMD. We tested the
following colors, defined in an RGB color space.
White: RGB (255,255,255). Red: RGB (255,0,0). Green: RGB (0,255,0). Black: RGB (0,0,0).
An important issue is the visualization of the color
black on the optical see-through HMDs. In fact, theRGB (0,0,0) means, in additive color composition, that
all the display pixels are off, so it will be transparent on
an optical see-through device. Therefore, in this work,
when we speak about the color black, we mean no
added color, which is the background color bleeding
through these transparent pixels. In our presentation
Figure 5. Player mode: during the test, the service screen (not seen by the participant) shows the list of configurations
(center), progress bar (lower left), and the user performance and top score.
Table 1. Independent Variables of Our Experiment
Independent variables
Participant users 14 11 males, 3 females
Backgrounds 3 Testbed frame, tool workbench, engine
Text styles 4 Text only, with outline, with billboard, with outline and billboard
Colors 4 Black, green, red, white (when applicable)
Repetitions 5 Five for each presentation mode and background
Total trials 2,520 14 3 12 5
Table 2. Dependent Variable of Our Designed Experiment
Dependent variables
Completion time in ms
Error rate Correct task completion 1,
wrong task completion 0
178 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
9/21
modes, the color black can only be used as text or as out-
line on a differently colored billboard, because a black
billboard is equivalent to no billboard. Our experimental
indoor environment has a low illuminance (around 300
lux); therefore, the transparent stroke of black text or ofthe outline is perceived as dark enough to be called
black. With this meaning we considered black as a color
in our experiment. Our purpose is to study its perform-
ance as contrast color to background colors (i.e., white,
red, and green), as indicated by ISO standards (ISO
2011a, 2011b, 2004).
4.4 Text Styles
Font type and size were not considered as variables.
We chose the sans serif Helvetica font because it was
used in most of the readability experiments in the litera-ture and we chose 22 points as the font height as the
smallest size that we can clearly read in the pre-test
phase. Indeed, we focused on four different text styles.
There are two well-known techniques in the literature to
isolate text from a variable background: the outline,
inspired by Harrisons AI font, and the billboard, which
Figure 6. The three real backgrounds used in the tests: testbed frame (left), tool workbench (center), motorbike engine (right).
Figure 7. Normalized luminance histograms (number of pixels for each luminance value) of the pictures of the background used in the test:
negative (testbed frame), positive (tool workbench), and neutral (engine).
Figure 8. The experiment setups for the three backgrounds.
Fiorentino et al. 179
-
8/9/2019 readability of text AR.pdf
10/21
has proven effective but costly in terms of pixels. We
used four main text styles in our experiments: the first,
the simplest, is text only; the second is text with a 2-
point-wide outline; the third is text with a rectangular
billboard; and finally, the fourth is text with a combina-
tion of outline and billboard.
Table 3 shows the 12 presentation modes used in the
experiments preliminarily selected from the possible ones
changing text color (black, green, red, white), outlinemodes (black, i.e., transparent, green, red), and billboard
mode (green, red, white).
4.5 Participants
Fourteen unpaid participants were recruited for the
study among undergraduates in technical subjects. They
were 11 males and three females with the following age
distribution: seven from 21 to 25 and seven from 26 to
30, with an average age of 25. Six participants wore
glasses but none had color deficiency. All were right-eyedominant. The users wore the HMD in front of the right
eye, and they received adequate instruction and per-
formed a trial session. The participants could discontinue
the test at any time and the break time was not limited.
The subjects performed a total of 2,520 trials (14 partici-
pants 12 permutation modes 3 backgrounds 5
repetitions) ensuring a Latin square design. Each subject
saw, on each background, a total of 60 visualization
queries. At the end of the complete trial, each participant
filled out a questionnaire to detect particular problemsand to collect evaluations and opinions.
4.6 Apparatus
Our hardware system for experimental tests con-
sisted of the following.
Notebook HP Pavilion dv6-6150sl Entertainment
Notebook PC, Intel Core i5-2410M, 2.30 GHz,
RAM: 4 GB di DDR3, graphics card: AMD Radeon
HD 6770M with Windows 7 and HMD test soft-
ware. Viewer Liteye LE 750A, OLED display, 800 600
60 Hz, contrast 100:1, transmission 70/30,
luminance 300 cd/m2, 288diagonal FOV, mounted
on ergonomic support, and connected by VGA (see
Figure 9). We set the diopter adjustment to 0 for all
users. Wireless numeric keypad by Targus, model
AKP02EU, battery powered, to collect participants
answers.
4.7 Hypotheses
Prior to conducting the study, we formulated the
following hypotheses.
H1. Different workshop backgrounds will affect user
test performance (completion time and error): text
readability is background-dependent.
Table 3. The 12 Presentation Modes Used in the Experiments
Text
color
Outline
color
Billboard
color
1 Black Green
2 Black Red
3 Black White
4 Green
5 Green White
6 Green Black White
7 Red White
8 Red
9 Red Black White
10 White
11 White Green
12 White Red
Figure 9. The optical see-through HMD used in our experiments
(Liteye LE 750A).
180 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
11/21
H2. The presentation mode will affect performance.H3. Text style will affect performance.
H4. Text color will affect performance.
H5. Outline color will affect performance.
H6. Billboard color will affect performance.
5 Results
We analyzed the acquired data to evaluate the main
effect of background, color, and text style on readability
performance. We used quantitative and qualitative data.
The completion time and error rate were quantitativedata, while the subjective responses were the qualitative
data. In a preliminary phase of the analysis, we removed
the outliers with the Tukeys outlier filter based on the
interquartile range. To make statistical inferences, we
started to inquire whether the completion time data fol-
lowed a normal distribution. We used the ShapiroWilk
normal test, the AS R94 algorithm, which rejected the
normal distribution for all the samples (p< .05). The
skewness analysis showed a positive value for all samples:
this is typical in task-time-completion measures that fol-
low a lognormal distribution. We log10-transformed allthe completion times prior to statistical analysis. To eval-
uate the homoscedasticity, we applied the Levene test,
because this test does not require equal dimensions for
all the groups.
As to the error rate, the faults considered in our analy-
sis are users wrong answers. We used the method of
N 2 contingency tables to do statistical inference (p
.05) on error data. We used the following error rate defi-
nition.
ER% Number of errorsNumber of targets
100
Each sample of 12 modes could have 70 possible
errors (14 participants 5 repetitions), that is, the num-
ber of targets in the error rate definition.
In the following sections, we detail the results as to
the background effect, the text style effect, and the color
effect with a discussion on how to optimize readability
when the color message is required, such as for safety
warning.
5.1 Background Effect
With regard to completion times, the ANOVA
showed a main effect of background,F(2 2442)
49.377;p< .001. Figure 10 shows the box plot of the
completion times. On each box, the central mark is the
median, the edges of the box are the 25th and 75th per-
centiles, the whiskers extend to the most extreme data
points not considered outliers, and the outliers are plot-
ted individually. Considering the mean completion time,
the engine background had times 13% lower than the
tool workbench and 17% lower than the testbed frame.The application of the ShapiroWilk test revealed that
the answer distributions for the three backgrounds were
not all normal, and homoscedasticity was not verified.
Therefore, we applied the Friedman test because it is
more indicated than ANOVA in these conditions. The
Friedman test showed a significant difference among the
three backgrounds (see Table 4). We used as the post
hoc pair-comparison the Wilcoxon signed-ranked test,
with Bonferroni correction (a 0.017), which con-
firmed that the engine background had the lowest
response time (with respect to the testbed frame,Z
22.601,p< .001; with respect to the tool workbench,
Z 23.907,p< .001), while the testbed frame back-
ground had statistically the highest answer time (with
respect to the tool workbench,Z 23.526,p< .001).
An explanation could be that neutral backgrounds are
the best for text readability. As a result, we can confirm
Figure 10. Box plot of the completion times for the three backgrounds
(the X marks the mean of samples).
Fiorentino et al. 181
-
8/9/2019 readability of text AR.pdf
12/21
the hypothesis H1 relative to the completion time. Textreadability depends on the background.
As to the error rates for the three different back-
grounds, we computed an average error of 6.67% on the
testbed frame background, 7.26% for the tool work-
bench background, and 6.55% on the engine back-
ground (see Figure 11).
Comparing the three sample error rates with contin-
gency tables, we did not find statistically significant dif-
ferences among the three backgrounds, w2(2) 0.3869
5.991. Unlike the completion-time analysis, this
result, limited to error rates, does not support hypothesis
H1.
5.2 Presentation Mode Effect
With regard to completion, the results in Table 5
about normality and homoscedasticity begged the appli-
cation of the Welch ANOVA test for all of the 12 combi-
nations, and the GamesHowell test for the 66 pair com-
parisons. Differences among all samples were statically
shown, thus hypothesis H2 was supported (see Figure
12).
The fastest presentation modes resulted from the black
text, no outline, and white billboard (mode 3). The best
performance of mode 3 was statistically confirmed
against modes 9, 7, 4, 11, and 8 (d 0.092,p .001).
It is important to note that mode 5 (green, no, white)
and mode 6 (green, black, white) showed no statistical
difference,F(1) 0.012,p 0.912, while mode 7 (red,
no, white) and mode 9 (red, black, white) are not
strongly different,F(1) 3.962;p .047. This confirms
that the black color is effectively not a color since it
shows no main effect if used as an outline.
An error-rate comparison (see Figure 13) among allthe 12 combinations revealed a statistically significant
difference,w2 (11) 37.811 > 19.675, allowing us to
accept hypothesis H2; but in this case, the performance
distribution is different from the completion-time analy-
sis. The best performing modes were modes 2, 5, and
12. An interesting result is that presentation modes dis-
playing red text (modes 7, 8, and 9) had bad scoring,
both for completion times and error rates.
5.2.1 Interaction BackgroundPresentation
Mode. We tested the interaction between the back-ground and the presentation-mode effects with a two-
ways unbalanced ANOVA, which showed that there is
no interaction effect,F(22,2427) 0.778;p 0.756.
Every presentation mode displays results respecting the
background ranking (with the partial exception of modes
2 and 8) as shown in the radar plot in Figure 14.
Table 4. Background Data Analysis
Background Testbed frame Tool workbench Engine
ShapiroWilk test W
0.996 W
0.997 W
0.995p .031 p .097 p .015
Levene test F(2,2442) 9.895;p< .001
Mean rank 2.95 2.01 1.03
Friedman test w2(2) 1518.9;p< .001
Figure 11. Box plot of error rate for the three backgrounds (the X
marks the mean of samples).
182 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
13/21
Table 5. Completion-Time Analysis of the Presentation Modes, Sorted by the Mean Response Time
Presentation
mode
Mean response
time (ms) ShapiroWilk test
Levene
test
Welch
ANOVA test
3 (black,-,white) 5,441 W(205) 0.996 p .841
12 (white,red,-) 5,909 W(209) 0.989 p .119
1 (black,-,green) 6,108 W(207) 0.992 p .306
6 (green,-,white) 6,109 W(203) 0.992 p .385
10 (white,-,-) 6,112 W(197) 0.994 p .599
5 (green,-,white) 6,139 W(208) 0.995 p .764 F(11,2451) 4.359 F(11) 12.653
2 (black,-,red) 6,239 W(206) 0.990 p .189 p< .001 p< .001
8 (red,-,-) 6,728 W(206) 0.990 p .189
11 (white,green,-) 6,864 W(204) 0.990 p .184
4 (green,-,-) 6,966 W(209) 0.995 p .763
7 (red,-,white) 7,276 W(205) 0.997 p .927
9 (red,black,white) 7,914 W(204) 0.996 p .813
Figure 12. Box plot of completion time for each presentation mode (the X marks the mean of samples).
Fiorentino et al. 183
-
8/9/2019 readability of text AR.pdf
14/21
Figure 13. Box plot of error rate for each presentation mode (the X marks the mean of samples).
Figure 14. Radar plot of response times (ms) for the three backgrounds in the 12 presentation modes.
184 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
15/21
5.3 Text Style Effect
To analyze the effect of text styles, we gathered all
the presentation modes in four groups, as presented in
Table 6. The completion-time analysis showed that the
data did not pass the homoscedasticity test; therefore,
we used the Welch-ANOVA test, which revealed a signif-
icant difference among the styles (see Table 6). This
result allowed us to accept hypothesis H3.
The GamesHowell post hoc test showed clearly that
the text and billboard style performed better than the
text-only style and the text outline and billboard style.
The text and outline style is better than only the worst
style: text outline and billboard (see Table 6). As to error
rates, there is no significant difference among the text
styles,w2(3) 5.14 < 7.82.
5.4 Color Effect
5.4.1 Text Color. We explored the text colors by
collecting all data in four groups: black (1, 2, and 3),
green (4, 5, 6), red (7, 8, 9), and white (10, 11, 12).
The results are represented in Figure 15. The compari-son of these four samples gave as the result that the color
black seems to outperform all other colors; ANOVA:
F(3,2448) 25.420,p< .001, but indeed, good per-
formance can be attributed to the presence of the bill-
board, which is always associated with black text, as
reported in Section 5.3. Therefore, we removed the
black group and proceeded to the comparison of green,
red, and white colors for text.
The ShapiroWilk test showed that all groups had a
normal distribution, but the Levene test showed that the
variances were different (see Table 7).
In this case, we applied the Welch-ANOVA test to
compare the three samples. There was a statistically sig-
nificant difference in the text color choice; thus, hypoth-
esis H4 is accepted. The white text group had (see Fig-
ure 16) the lowest answer time. GamesHowell post hoc
tests allowed pair-wise comparisons, and they revealed
that: (1) the green text color group is statistically betterthan red (d 0.052,p< .001); (2) the white text color
is better than red (d 0.061,p< .001).
As to error rate, there were statistically significant dif-
ferences among the three color text samples: w2(2)
7.563 > 5.991. The green text group has the minimum
average error rate, at 6.03%, compared to 6.34% for the
white text group, and 9.68% for the red text group (see
Figure 16). The black (transparent) text group has an av-
erage error rate of 5.24%. This is in accord with hypothe-
sis H4.
Moreover, the red text group did not perform as wellas the other colors, as confirmed by completion-time
results and as reported in the previous literature.
5.4.2 Outline Color. As stated in Section 5.2,
the color black, for the reasons discussed in that section,
shows no main effect if used as the outline. Therefore,
Table 6. Text Style Comparison
Text style
Text only
(T)
Text and
outline (TO)
Text and
billboard (TB)
Text outline and
billboard (TOB)
Presentation modes 4, 8, 10 11, 12 1, 2, 3, 5, 7 6, 9
ShapiroWilk test W 0.996 W 0.994 W 0.997 W 0.996
p .101 p .092 p .072 p .406
Levene test F(3,2249) 5.944;p< .001
Mean (ms) 6,622 6,368 6,237 6,887
Welch-ANOVA test F(3) 5.977;p .001
GamesHowell test TB better than TOBd 0.042,p .001
TB better than Td 0.026,p .022
TO better than TOBd 0.034,p .035
Fiorentino et al. 185
-
8/9/2019 readability of text AR.pdf
16/21
we could compare only the green and the red outline
(mode 11 and mode 12) applied on white text. The sta-
tistical results support hypothesis H5 (12 better than
11) for both completion-time (GamesHowell post hoc
test:d 0.065,p .002) and error-rate (see Figure 16)
analyses. The red outline performs better than the green
outline, probably because of a higher contrast between
the text and the outline.
5.4.3 Billboard Color. Next, we wanted to ana-
lyze what was the best billboard among the three combi-
nations available under the black text group. Therefore,
we kept the black text and focused our attention on con-
veying the color information using the billboards (green,
Table 7. Text Color Analysis
Text color Green Red White
ShapiroWilk test W(617) 0.998 W(611) 0.997 W(606) 0.996
p .653 p .248 p .103
Levene test F(2,1831) 3.688p .025Means (ms) 6,412 7,228 6,281
Welch-ANOVA test F(2) 21.344p< .001
Figure 15. Box plot about scattered completion-time data referring to each text color group (the X marks
the mean of samples).
Figure 16. Box plot of error rate for each grouped text color presenta-
tion mode (the X marks the mean of samples).
186 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
17/21
red, and white). For these presentation modes we had
three normal distributions and homogeneity for variance
(see Table 8).
The one-way ANOVA revealed a statistically signifi-
cant difference among modes 1, 2, and 3, that is, black
text, no outline, and green, red, and white billboards,
thus accepting hypothesis H6 on billboard color influ-
ence (see Figure 17). Tukey post hoc tests confirmed
statistically significant differences between mode 3 and
mode 1 (d 0.050,p .009), and between mode 3 and
mode 2 (d
0.059,p
.017) showing the best presen-tation mode with back text over a white billboard.
5.5 Qualitative Results
The post-experiment questionnaire was composed
of two parts, both using a Likert scale. The subjects were
presented with the stimuli as reminder. In the first part,
the participant had to mark every presentation mode
with a vote from 1 to 5. In the second part, the partici-
pant answered questions about his or her opinions using
five judgment values: not at all, a little, on average,
enough, much. Figure 18 shows the cumulative
responses of the user interviews.
5.6 Discussion
A first result is that the real industrial background
(300 lux) influenced text readability with regard to
completion time. This is in accordance with previous
work that used different setups: a printed poster and
video on display monitors. Unlike the completion time,
the analysis of the error rates showed that backgrounddid not have a significant influence. This last aspect
should be further investigated, because our results are in
contrast with general expectation and previous results
(e.g., Jakowski et al., 2010). Gabbards tests (Gabbard
et al., 2006), which are closer to our setup, revealed an
error rate that was not significant, and therefore they
ignored it in the statistics. Our tests, on the contrary,
showed higher error rates (6.54% vs. 1.9%). The engine
background performed better than the other two. This
result may depend on several factors, including lumi-
nance profile, which, in the specific case, is neutral,unlike the other two. The presentation mode showed a
main effect on both completion time and error rate. This
result is in accordance with the literature. A non-trivial
statistical outcome is that there is no interaction effect
between background and presentation mode. This result
is in contrast to previous findings in outdoor environ-
Table 8. Comparison of Billboard Colors (Black Text)
Black text on billboard
Green Red White
ShapiroWilk test W(207) 0.992 W(206) 0.990 W(205) 0.996
p .306 p .189 p .841
Levene test F(2, 615) 0.126p .882
Means (ms) 6,109 6,237 5,445
One-way ANOVA F(2, 615) 7.061p .001
Figure 17. Box plot of scattered completion-time data for black text
color and the three different billboard colors (the X marks the mean of
samples).
Fiorentino et al. 187
-
8/9/2019 readability of text AR.pdf
18/21
ments. Gabbard, Zedlitz, Swan, and Winchester (2010)
found strong interactions between background and dis-
play color. However, these outcomes were reported for
outdoor environments with 8001000 lux. In our opin-
ion, our result justifies the efforts in finding an optimal
presentation mode, since it will be independent from the
background when the luminance of the display is much
brighter than the environment, as in an indoor industrial
background.
Among all the presentation modes, the text stylerevealed a main effect on completion time but not on
error rates. Post hoc analysis showed that the billboard is
more effective than outline or text only, in accordance
with the literature. The good performance of the bill-
board has a drawback in terms of scene occlusion.
The results achieved revealed that completion time,
error rate, and user interviews are not coherent in defin-
ing a unique ranking of presentation modes. According
to response time, the best results are obtained by mode
3 (black text, white billboard), and by mode 12 (white
text, red outline). The third-best performer is mode 1(black text, green billboard), very similar to mode 3. An
explanation can be found in the higher contrast between
text and background in accordance with the ISO recom-
mendations about text readability. We validated, in the
industrial context, the principle of using maximum con-
trast in order to achieve fast readability on a see-through
HMD. Therefore, our results suggest either black text
and white billboard or white text only when reading time
is important, for example, for maintenance instructions.
In contrast to the results obtained from completion
time, the error rates showed a different ranking. The best
results are obtained by mode 2 (black text, red bill-
board), followed by mode 5 (green text, white bill-
board), and by mode 12 (white text, red outline).
Although our results suggest the use of colors when the
information is critical and accuracy is mandatory (e.g.,warning signal), deeper study is necessary. Also, the pre-
sentation mode qualitative ranking obtained from user
interviews is not in concert with user quantitative per-
formance in terms of completion time and/or error rate.
This is quite interesting, since it proves that the user is
not able to choose the best presentation mode in terms
of performance. We therefore suggest in AR application
design to prevent users from freely customizing visual-
ization preferences. The only result that is confirmed by
completion time, error rate, and user interview is that
presentation modes displaying red text perform poorly,as already shown in the literature (e.g., Gabbard et al.,
2007; Fukuzimi et al., 1998). In industrial applications,
it can be necessary to convey specific color information
along with the textual description. In this case, our tests
recommend the use of a specific color for the billboard
and black (transparent) for the text.
Figure 18. Cumulative marks given by participants at the end of their test trials (range 15).
188 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
19/21
6 Conclusion
We presented an empirical study on the readability
of text styles using an optical see-through HMD on dif-
ferent industrial scenarios. A preliminary test, supported
by a software tool implemented by the authors, was used
to explore a large number of configurations against a gal-
lery of industrial images taken from the internet. We
selected and tested 12 presentation modes using four
main colors (black/transparent, white, red, and green),
four different text styles (text only, text and outline, text
and billboard, text and outline and billboard), and three
different real workshop backgrounds (testbed frame, a
tool workbench, and a motorbike engine). We ran 2,520
test trials with 14 participants who were interviewed after
the experiment.
The first finding of this work is that both the presenta-
tion mode and the background influence the readability
of text, but there is no interaction effect between these
two variables. An important result is that an optimal pre-
sentation mode will work well, independent of the back-
ground in indoor industrial lighting conditions (300
lux). We also note that the user is not able to choose the
best performing presentation mode, and therefore we
recommend that an AR application should not allow the
user to customize the visualization preferences. Another
interesting aspect is that the presentation mode differen-tially influences completion time and error rate.
The present study allows us to draw some guidelines
for an effective use of AR text visualization in industrial
environments. In particular, we suggest maximum con-
trast styles, such as black text and white billboard or
white text only, when reading time is important, and the
use of colors when avoiding errors in readability is criti-
cal. We also suggest a colored billboard with black text
where colors have a specific meaning. Billboards provide
the best performance, but at the cost of scene occlusion.
Future investigation is needed to explore billboard areaoptimization. Apart from black and white colors, we
tested only red and green. Future work will involve test-
ing with other colors such as blue, yellow, orange, and
so on. As a final remark, our findings are HMD-device-
dependent, and for this reason, we provide our software
and the test configurations presented in this paper on
our website, in order to allow other researchers to collect
and compare results using different devices.
Acknowledgments
The authors would like to thank Michele Mazzoccoli and
Michele Gattullo for the usefulhelp provided in the test design
and execution, and all the students who took part in the test.
References
ASME. (2007). Scheme for the identification of piping sys-
tems. (ASME A13.1) Retrieved September 4, 2012 from
https://www.asme.org/products/codes-standards/
scheme-for-the-identification-of-piping-systems
Cornish, D., & Dukette, D. (2009). The essential 20: Twenty
components of an excellent health care team(pp. 7273).
Pittsburgh, PA: RoseDog Books.
Fiorentino, M. (2012). HMD test [software]. Retrieved from
Polytechnic of Bari, Department of Mechanics, Mathematics
and Management, Vr3Lab website: http://www.dimeg
.poliba.it/vr3lab/
Fukuzimi, S., Yamazaki, T., Kamijo, K., & Hayashi, Y. (1998).
Physiological and psychological evaluation for visual display
colour readability: A visual evoked potential study and a sub-
jective evaluation study. Ergonomics, 41(1), 89108.
doi:10.1080/001401398187341
Gabbard, J. L., Swan, J. E., II, & Hix, D. (2006). The effects
of text drawing styles, background textures, and natural
lighting on text legibility in outdoor augmented reality.
Presence: Teleoperators and Virtual Environments, 15(1),
1632. doi:10.1162/pres.2006.15.1.16
Gabbard, J. L., Swan, J. E., II, Hix, D., Kim, S.-J., & Fitch, G.
(2007). Active text drawing styles for outdoor augmented
reality: A user-based study and design implications. Proceed-
ings of the IEEE Virtual Reality Conference, VR 07, 3542.
doi:10.1109/VR.2007.352461
Gabbard, J. L., Zedlitz, J., Swan, J. E., II, & Winchester, W.
W. III (2010). More than meets the eye: An engineeringstudy to empirically examine the blending of real and virtual
color spaces.Technical Papers, Proceedings of IEEE Virtual
Reality, 10, 7986. doi:10.1109/VR.2010.5444808
Harrison, B. L., & Vicente, K. J. (1996). An experimental eval-
uation of transparent menu usage. Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems: Com-
Fiorentino et al. 189
-
8/9/2019 readability of text AR.pdf
20/21
mon Ground, CHI 96, 391398. doi:10.1145/238386
.238583
Hecht, E. (1987).Optics(2nd ed.). Reading, MA: Addison-
Wesley.
Hirano, H. (1995).5 pillars of the visual workplace. Cam-
bridge, MA: Productivity Press.
ISO. (1993). Ergonomic requirements for office work with
visual display terminals (VDTST). Part 3: Visual display
requirements. ISO 9241-3. Geneva: ISO.
ISO. (2002). Lighting of work placesPart 1: Indoor. ISO
8995-1. Geneva: ISO.
ISO. (2004). Graphical symbolsSafety colours and safety
signsPart 2: Design principles for product safety labels.
ISO 3864, Part 2. Geneva: ISO.
ISO. (2011a). Graphical symbolsSafety colours and safety
signsPart 1: Design principles for safety signs and safety
markings. ISO 3864, Part 1. Geneva: ISO.ISO. (2011b). Graphical symbolsSafety colours and safety
signsPart 4: Colorimetric and photometric properties of
safety sign materials. ISO 3864, Part 4. Geneva: ISO.
Jankowski, J., Samp, K., Irzynska, I., Jozwowicz, M., &
Decker, S. (2010). Integrating text with video and 3D
graphics: The effects of text drawing styles on text readabil-
ity.Proceedings of the 28th International Conference on
Human Factors in Computing Systems (CHI 10),1321
1330. doi:10.1145/1753326.1753524
Legge, G. E., Parish, H. D., Leubker, A., & Wurm, H. L.
(1990). Psychophysics of reading. XI. Comparing color con-
trast and luminance contrast.Journal of the Optical Society of
America, 7(10), 20022010. doi:10.1364/JOSAA
.7.002002
Leykin, A., & Tuceryan, M. (2004). Automatic determination
of text readability over textured backgrounds for augmented
reality systems.Proceedings of the 3rd IEEE/ACM Interna-
tional Symposium on Mixed and Augmented Reality, ISMAR
04,224230. doi:10.1109/ISMAR.2004.22
OSHA. (2007). Safety color code for marking physical hazards.
U.S. Department of Labor Regulations (Standards-29 CFR),
1910.144. Washington, DC: OSHA.
Petkov, N., & Westenberg, M. A. (2003). Suppression of
contour perception by band-limited noise and its relation
to nonclassical receptive field inhibition. Biological Cybernet-
ics, 88(3), 236246. doi:10.1007/s00422-002-0378-2
Tanaka, K., Kishino, Y., Miyamae, M., Terada, T., & Nishio, S.
(2008). An information layout method for an optical see-through head-mounted display focusing on the viewability.
Proceedings of the 7th IEEE/ACM International Symposium
on Mixed and Augmented Reality,139142. doi:10.1109
/ISMAR.2008.4637340
Uva, A. E., Cristiano, S., Fiorentino, M., & Monno, G.
(2010). Distributed design review using tangible augmented
technical drawings.Computer-Aided Design, 42(5), 364
372. doi:10.1016/j.cad.2008.10.015
van Krevelen, D. W. F., & Poelman, R. (2010). A survey of
augmented reality technologies, applications and limitations.
The International Journal of Virtual Reality, 9(2), 120.
190 PRESENCE: VOLUME 22, NUMBER 2
-
8/9/2019 readability of text AR.pdf
21/21
C o p y r i g h t o f P r e s e n c e : T e l e o p e r a t o r s & V i r t u a l E n v i r o n m e n t s i s t h e p r o p e r t y o f M I T P r e s s
a n d i t s c o n t e n t m a y n o t b e c o p i e d o r e m a i l e d t o m u l t i p l e s i t e s o r p o s t e d t o a l i s t s e r v w i t h o u t
t h e c o p y r i g h t h o l d e r ' s e x p r e s s w r i t t e n p e r m i s s i o n . H o w e v e r , u s e r s m a y p r i n t , d o w n l o a d , o r
e m a i l a r t i c l e s f o r i n d i v i d u a l u s e .