a study of depth perception in hand-held augmented reality...

6
To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014. A Study of Depth Perception in Hand-Held Augmented Reality using Autostereoscopic Displays Matthias Berning * Daniel Kleinert Till Riedel Michael Beigl Karlsruhe Institute of Technology (KIT), TECO, Karlsruhe, Germany ABSTRACT Displaying three-dimensional content on a flat display is bound to reduce the impression of depth, particularly for mobile video see-trough augmented reality. Several applications in this domain can benefit from accurate depth perception, especially if there are contradictory depth cues, like occlusion in a x-ray visualization. The use of stereoscopy for this effect is already prevalent in head- mounted displays, but there is little research on the applicability for hand-held augmented reality. We have implemented such a proto- type using an off-the-shelf smartphone equipped with a stereo cam- era and an autostereoscopic display. We designed and conducted an extensive user study to explore the effects of stereoscopic hand-held augmented reality on depth perception. The results show that in this scenario depth judgment is mostly influenced by monoscopic depth cues, but our system can improve positioning accuracy in challeng- ing scenes. Keywords: Autostereoscopy, mobile devices, depth perception, augmented reality, user study Index Terms: H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—Artificial, augmented, and vir- tual realities; H.5.2 [Information Interfaces and Presentation]: User Interfaces—Ergonomics 1 I NTRODUCTION Depth distortion is still one of the most common perceptual prob- lems found in Augmented Reality (AR) today [8]. It describes the issue that users cannot correctly identify the spatial relation be- tween objects based on their viewpoint, which is especially the case for the combination of real and virtual objects. Depending on the application this information can be crucial for a good performance, e.g. in maintenance task, manuals or when visualizing occluded objects [16]. There is already a large body of research in the area of depth perception for AR, which is focused on head-mounted displays [18, 10, 17, 5]. This is viable for professional applications, but due to commercial reasons, the current platform of choice for consumer oriented AR, are mobile hand-held devices. Since they are limited to video see-through AR and offer only a small field-of-view, these devices are even more prone to depth distortion. The availability of smartphones featuring an autostereoscopic display could enable the use of binocular depth cues in this area. The displays promise a high degree of immersion and increased spatial awareness through a single screen without the need for addi- tional glasses worn be the user (i.e. polarized or shutter). In combi- nation with the two cameras on the back of the device, it is possible to realize true stereoscopic AR. Our contribution in this work is an experimental evaluation on the effect of stereopsis for depth perception in mobile hand-held * e-mail: [email protected] AR. Different virtual objects were embedded in a real scene and displayed to the user. The participants in our study had to align them accurately with a real object. The goals were a) to verify if the autostereoscopic display can help an user to understand the spatial relation between real and virtual objects and b) to compare the effect to other depth cues. 2 DEPTH PERCEPTION AND STEREOSCOPIC DISPLAYS According to Cutting [2], the relative importance of different depth cues is determined by the distance of the objects to the user. There are three different areas: In personal space, from 0-2m directly in front of the observer, binocular disparity provides the most accu- rate depth judgments. It is the most important depth cue provided by stereoscopic vision and particularly useful to resolve ambigui- ties created by other perceptual cues. Kyt¨ o et al. [9] list some of its special benefits for AR, e.g. the layering of augmentations to increase the information density. Current autostereoscopic displays are either of the lenticular sheet or the parallax barrier type [13]. Both have in common, that a normal display is used and the images for the left and right eye are arranged in interleaved columns. By applying a sheet to structure the emitted light, each image is only visible from a specific angle which should be the corresponding eye of the user. Due to this arrangement, stereopsis is only achieved in a specific distance and orientation to the display, which is one of the major drawbacks. An- other problem is the conflict between accommodation and conver- gence, which are directly linked in normal vision. When consum- ing binocular content on a flat screen, the relative orientation of the eyes will change (convergence) but the focus has to stay fixed on the screen (accommodation). Nevertheless, recent studies have shown that these devices can improve the user experience when consum- ing content on a hand-held device [14]. To this date, off-the-shelf smartphones are only available with parallax barrier type displays. They structure light by superimposing fine vertical lines onto the display, blocking every second column for each eye. Through the use of liquid-crystal arrays for this barrier, the stereoscopic function can be controlled dynamically. 3 RELATED WORK Understanding the perception of depth is a widely researched topic in VR and AR, although most of it is focused on HMDs. The work of Swan et al. [18] is exemplary for the conducted studies and is listed here explicitly because of the survey on the field included in the publication. They conducted two experiments to investigate the effects of stereoscopic vision on depth judgment in head-worn AR. Both showed a main effect of stereoscopy on depth judgment, resulting in greater accuracy. Dey et al. [3] looked at the effect of different hand-held screen sizes and resolutions on depth-perception. In several AR test sce- narios, they found a significant effect of resolution on distance esti- mation. Autostereoscopic displays were not part of their evaluation. The authors of [1] conducted a Wizard-of-Oz study, using a cutout phone to simulate the effects of binocular vision on position- ing accuracy of a hand behind the device. They show that perfor- mance was worse using a monocular setting. A functional prototype 1

Upload: duongtruc

Post on 12-Mar-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

A Study of Depth Perception in Hand-Held Augmented Reality using

Autostereoscopic Displays

Matthias Berning∗ Daniel Kleinert Till Riedel Michael Beigl

Karlsruhe Institute of Technology (KIT), TECO, Karlsruhe, Germany

ABSTRACT

Displaying three-dimensional content on a flat display is boundto reduce the impression of depth, particularly for mobile videosee-trough augmented reality. Several applications in this domaincan benefit from accurate depth perception, especially if there arecontradictory depth cues, like occlusion in a x-ray visualization.The use of stereoscopy for this effect is already prevalent in head-mounted displays, but there is little research on the applicability forhand-held augmented reality. We have implemented such a proto-type using an off-the-shelf smartphone equipped with a stereo cam-era and an autostereoscopic display. We designed and conducted anextensive user study to explore the effects of stereoscopic hand-heldaugmented reality on depth perception. The results show that in thisscenario depth judgment is mostly influenced by monoscopic depthcues, but our system can improve positioning accuracy in challeng-ing scenes.

Keywords: Autostereoscopy, mobile devices, depth perception,augmented reality, user study

Index Terms: H.5.1 [Information Interfaces and Presentation]:Multimedia Information Systems—Artificial, augmented, and vir-tual realities; H.5.2 [Information Interfaces and Presentation]: UserInterfaces—Ergonomics

1 INTRODUCTION

Depth distortion is still one of the most common perceptual prob-lems found in Augmented Reality (AR) today [8]. It describes theissue that users cannot correctly identify the spatial relation be-tween objects based on their viewpoint, which is especially the casefor the combination of real and virtual objects. Depending on theapplication this information can be crucial for a good performance,e.g. in maintenance task, manuals or when visualizing occludedobjects [16].

There is already a large body of research in the area of depthperception for AR, which is focused on head-mounted displays [18,10, 17, 5]. This is viable for professional applications, but due tocommercial reasons, the current platform of choice for consumeroriented AR, are mobile hand-held devices. Since they are limitedto video see-through AR and offer only a small field-of-view, thesedevices are even more prone to depth distortion.

The availability of smartphones featuring an autostereoscopicdisplay could enable the use of binocular depth cues in this area.The displays promise a high degree of immersion and increasedspatial awareness through a single screen without the need for addi-tional glasses worn be the user (i.e. polarized or shutter). In combi-nation with the two cameras on the back of the device, it is possibleto realize true stereoscopic AR.

Our contribution in this work is an experimental evaluation onthe effect of stereopsis for depth perception in mobile hand-held

∗e-mail: [email protected]

AR. Different virtual objects were embedded in a real scene anddisplayed to the user. The participants in our study had to alignthem accurately with a real object. The goals were a) to verifyif the autostereoscopic display can help an user to understand thespatial relation between real and virtual objects and b) to comparethe effect to other depth cues.

2 DEPTH PERCEPTION AND STEREOSCOPIC DISPLAYS

According to Cutting [2], the relative importance of different depthcues is determined by the distance of the objects to the user. Thereare three different areas: In personal space, from 0-2m directly infront of the observer, binocular disparity provides the most accu-rate depth judgments. It is the most important depth cue providedby stereoscopic vision and particularly useful to resolve ambigui-ties created by other perceptual cues. Kyto et al. [9] list some ofits special benefits for AR, e.g. the layering of augmentations toincrease the information density.

Current autostereoscopic displays are either of the lenticularsheet or the parallax barrier type [13]. Both have in common, that anormal display is used and the images for the left and right eye arearranged in interleaved columns. By applying a sheet to structurethe emitted light, each image is only visible from a specific anglewhich should be the corresponding eye of the user. Due to thisarrangement, stereopsis is only achieved in a specific distance andorientation to the display, which is one of the major drawbacks. An-other problem is the conflict between accommodation and conver-gence, which are directly linked in normal vision. When consum-ing binocular content on a flat screen, the relative orientation of theeyes will change (convergence) but the focus has to stay fixed on thescreen (accommodation). Nevertheless, recent studies have shownthat these devices can improve the user experience when consum-ing content on a hand-held device [14]. To this date, off-the-shelfsmartphones are only available with parallax barrier type displays.They structure light by superimposing fine vertical lines onto thedisplay, blocking every second column for each eye. Through theuse of liquid-crystal arrays for this barrier, the stereoscopic functioncan be controlled dynamically.

3 RELATED WORK

Understanding the perception of depth is a widely researched topicin VR and AR, although most of it is focused on HMDs. The workof Swan et al. [18] is exemplary for the conducted studies and islisted here explicitly because of the survey on the field includedin the publication. They conducted two experiments to investigatethe effects of stereoscopic vision on depth judgment in head-wornAR. Both showed a main effect of stereoscopy on depth judgment,resulting in greater accuracy.

Dey et al. [3] looked at the effect of different hand-held screensizes and resolutions on depth-perception. In several AR test sce-narios, they found a significant effect of resolution on distance esti-mation. Autostereoscopic displays were not part of their evaluation.

The authors of [1] conducted a Wizard-of-Oz study, using acutout phone to simulate the effects of binocular vision on position-ing accuracy of a hand behind the device. They show that perfor-mance was worse using a monocular setting. A functional prototype

1

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

Renderer (Rajawali)

Tracker

(ARToolKitPlus)

ME

HM

HMME

HMHMHM

arema

Cria

p

1

2

3 4 5

L

R

Figure 1: Basic frame processing pipeline of our stereoscopic ARframework: (1) read camera frame from stereo pair, (2) detectmarkers in left image, (3) convert frame and draw it to the can-vas, (4) transform augmentations for left side and render, (5) applyextrinsic camera transformation and render right side.

using video see-trough could not be tested.

The Nintendo 3DS1 is a hand-held gaming console that supportsstereoscopic AR. There are some games that use this feature, butthe development on this platform is restricted by the manufacturer.Structured evaluations on the effects of depth-perception were notpossible with the available software.

The work of Kerber et al. [7] is the most relevant for our ownexperiments. Similar to our work, they conducted an experimentto determine the depth discrimination ability of participants usinga smartphone with an autostereoscopic display. For this purpose,two virtual textured cubes with varying size were displayed abovea real table and the subjects had to decide which one was closer. Intheir analysis, they found no significant effect of the stereoscopiccondition on depth perception. Unfortunately, the paper does notclarify how they handled the automatic disparity remapping per-formed by the phone. As detailed in the next chapter, this can havea severe effect on the perceived AR experience. Another aspect thatis not addressed is the alignment of real and virtual objects, sincethe tested scenario was deliberately chosen to be very close to VR.In our opinion, based on previous research, it is especially this inter-action that could benefit the most from stereoscopic vision, whichis why this is the focus of our work.

Mikkola et al. [14] investigated the effect of small autostereo-scopic displays on depth perception in static scenes. They founda significant improvement of accuracy and speed of depth judge-ments when introducing binocular disparity. The addition of sec-ondary depth cues as shadowing, texture gradient or depth blur hadno measurable effect on this result.

4 PROTOTYPE IMPLEMENTATION

We implemented our prototype on the off-the-shelf smartphone LGOptimus 3D Max (P720) featuring a stereo camera and an au-tostereoscopic display of the parallax barrier type. The systemis based on Android and runs the latest official firmware for thisdevice with version 2.3 of the mobile operating system. Process-ing resources are limited by the dual-core ARM processor, rated at1.2GHz and 1GB of RAM. The 4.3in display has a native resolutionof 800×480 in landscape orientation, which is halved horizontallywhen enabling the parallax barrier. The camera pair has a stereobasis of 24mm and a resolution of 5MP each, although we are onlyusing a much lower resolution for tracking. Both hardware units arecontrolled via the proprietary Real3D API provided by LG, whichextends the Android framework to support 3D user interfaces.

The software is based on AndAR 2, which provides a solid archi-tecture including the necessary modules for camera control, track-

1http://www.nintendo.com/3ds2https://code.google.com/p/andar/

ing and rendering, mapped to the Android OS. The basic frameprocessing is depicted in Figure 1. We extended the framework inseveral aspects to increase performance, while maintaining or evenreducing resource consumption. This includes upgrading the track-ing library to ARToolKitPlus3 [19] and replacing the renderer withRajawali 4. To maximize throughput for the stereoscopic mode, welimit tracking to the left video stream and calculate the transfor-mation for the virtual objects in the right part based on the cameraoffset. In addition to the lower resource consumption, this also pre-vents tracking errors from interfering with the stereoscopic effect.

We had to find the optimal configuration for the LG Real3D API,which provides fine grained control over the camera and displayhardware. Camera frames are set to be placed in top-bottom ar-rangement in memory, meaning the left frame is located before theright, allowing for sequential access to each one. By using theYUV image format, the luminance component can be separatedeasily and is forwarded to the tracker. The used tracking resolu-tion is 640× 480. Camera calibration from the OpenCV packageproduced the intrinsic camera parameters, as well as the extrinsictransformation matrix ME from the left to the right camera image.

After our first trials provided an unpleasant viewing experiencedue to unsatisfactory alignment, we identified the automatic dis-parity remapping of the phone to be responsible. Disparity remap-ping is used in stereoscopic video recordings to warp the sourceimages to shift the perceived depth of the scene to a range, which iscomfortable for the user, based on the employed display [12]. Ourhardware and also most other consumer electronics, enable such analgorithm by default. The algorithm interferes with our camera cal-ibration and the effects on depth perception are neither quantifiable,nor correctable, so we disabled it completely by setting the param-eter auto-convergence. This is also necessary to ensure thatthe displayable depth range is not skewed and lies completely be-hind the display from the user perspective.

The rendering component takes two inputs, the full video frameand the marker homography HM for each detected marker in the leftimage. The output is a side-by-side image with the two perspectivesadjacent to each other. First the video frame is converted on theGPU and applied to a static plane in full width. Next, the virtualobjects are transformed according to HM and the scene is renderedto the left half. After applying the extrinsic matrix ME , the sceneis rendered again on the right half. When the parallax-barrier isenabled, the side-by-side image is automatically converted by thedisplay driver. With the described pipeline, stereoscopic AR can beswitched on and off instantaneously, to enable direct comparison.In stereo mode, we achieved interactive frame rates for trackingand rendering with 25-30fps and 20-25fps respectively.

During the development we encountered some issues result-ing from the employed display technology. The accommodation-convergence-conflict, typical for autostereoscopic displays, de-creased after a short period of use. What remained was the ghost-ing, which occurs when bright pixels for one eye illuminate darkregions in the other image. Despite these shortcomings, initial usersreported an intense feeling of immersion and depth.

5 EXPERIMENTAL EVALUATION

In order to evaluate its effect on the perception of spatial relationsand to compare binocular disparity to other depth cues in hand-held AR we conducted a user study to quantify a) accuracy and b)completion time of depth positioning tasks between real and virtualobjects in a scene.

5.1 Study Design

The most common experiments to study depth perception are ver-bal report, perceptual matching, and open-loop action-based tasks

3https://launchpad.net/artoolkitplus4https://github.com/MasDennis/Rajawali

2

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

(a) same (b) shadow (c) texture (d) banknote (e) sphere

Figure 2: Experimental setup as displayed on the device. The virtual objects presented in the different test cases are: (a) Same sized object,always placed directly on the table; (b) Cube with the same depth as reference object and drop shadow on the table; (c) Cube with same depthand textured faces; (d) 10Euro banknote as an object of known size; (e) Sphere with same diameter as reference object.

100mm 60mm

175mm

180mm

260mm

170mm

475mm

originZ

Y

X

virtual object real object

marker

board

table surface

Figure 3: Schematic overview of the experimental setting on anoffice desk from the participants point of view. The virtual objectwas moved along the y-axis to be aligned with the real object.

[18]. They are based on egocentric depth judgments performed bythe test subject, which are either reported verbally, matched to a ref-erence distance or recreated after the observation. In this study wechoose perceptual matching, which is an action-based closed-looptask. The reasons for this are twofold: First, these tasks resem-ble several interactions found in hand-held AR. Second, due to themismatch between the stereo-base of our device compared to thehuman inter-ocular distance, we expected that first-time users areunable to correctly translate the perceived depth to a real distance.

We designed the task to represent our target area, limited to thepersonal space - 0-2m in front of the user - where stereoscopyshould have the largest effect. A schematic overview of the studysetup can be found in Figure 3. We displayed different objects, rep-resented by the green cube on the left, floating above the flat surfaceon an office desk. The task was to match the depth (y-position) ofthe virtual object with a real reference object, in our case, a pinkcuboid standing on the right of the desk. The position of the virtualobject could be varied parallel to the y-axis, along the dashed line inthe schematic. This was done using a vertical swipe gesture on thetouchscreen of the phone, which resulted in a relative movement ofthe object. To improve tracking stability, we used a marker boardwith six markers, which was overlayed with a large white virtualrectangle to eliminate additional perceptual clues from the markerand board dimensions. As shown in Figure 2, the white rectangleextended beyond the margins of the marker board, limited by thereal object on the right and the table edge in the other directions.The space was limited in the back by a wall. All tests were con-ducted in our laboratory with static illumination from artificial lightsources. Figure 2 gives an impression of the scene displayed on themobile device.

We wanted to compare the influence of disparity to other depthcues typically found in AR scenarios. Therefore we devised fivedifferent scenes with distinct virtual objects, featuring a variety ofdepth cues (see Figure 2). The virtual cuboid in the first test casehad the same dimensions as the real one and was always placeddirectly on the surface. This provided the visual height and theretinal size as additional cues. The two cubes were enhanced withdrop shadow and texture respectively. The banknote is an objectof known size. The sphere added no depth information. The testssame, shadow and texture also provided linear perspective as anauxiliary depth cue. All objects, except the banknote, had the samedimension along the y-axis as the reference object. We saw no needto restrict the motion of the participants for our experiment, allow-ing them to use motion parallax as another depth cue. They wereinstructed to stay behind the red line, marked with tape on the desksurface to hinder them from switching to a side or top-down view.Only few participants made extensive use of motion parallax duringthe experiment. Because of the unrestricted motion the depth of thereal object was not controlled and only one position was tested. Thedistance between the observer and the object varied around 1-2m.

The main subject of our evaluation was the effect of stereop-sis on hand-held augmented reality so each participant performedeach test case with and without the stereo condition. Each combi-nation of test case and display condition was repeated four times ina within-subject study with repetition, resulting in 5× 2× 4 = 40individual tests per subject. To counteract ordering effects, the testorder was completely randomized for each participant. In addition,the starting position of the virtual object was selected randomlyfrom ten different heights (evenly spaced between 50 and 140mm)and ten different depths (evenly spaced between 30 and 300mm), toreduce learning. This also ensured that the virtual object was placedin front as well as behind the reference object situated at 200mm.Although relative height in the visual field is also a depth cue, itsinfluence in personal space is regarded as minor.

Each of the 40 tests was started and ended with the press of aphysical button located on the side of the device. Since the devicewas operated in landscape orientation, the button could be operatedvery similar to a shutter release known from a camera.

We recorded the final depth offset between virtual and real ob-ject and the task completion time as dependent variables. The latteris corrected to account for the fact that the visibility of the aug-mentation was lost for short periods when the tracking failed. Anoverview of the experiment variables can be found in Table 1.

5.2 Participants

We recruited 24 participants but had to exclude two of them becauseof stereo blindness and another two because of technical problemsduring the experiment. The remaining 20 subjects (four female,age between 19 and 33, mean of 26) had normal or corrected-to-normal vision. Most of them were members of our faculty (studentsand employees). Although all of them had a technical background,

3

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

Table 1: Dependent and independent variables of the experiment

Independent variables

subject ID 20 (random variable)

test case 5 same, shadow, texture, banknote, sphere

display condition 2 mono, stereo

test repetition 4 1..4

initial depth 10 30, 60, ... 300mm (random variable)

initial height 10 50, 60, ... 140mm (random variable)

Dependent variables

depth offset depthVirtual - depthReal, mm

task completion time duration - timeOfInvisibility, ms

−200

−100

0

100

200

same shadow texture banknote sphere

test case

rela

tive

dep

th o

ffse

t (m

m)

display

mono

stereo

Figure 4: Boxplot of the relative depth positioning offset in thedifferent test cases and display conditions. Negative values indicatethat the virtual object was placed closer to the viewer compared tothe reference. The accuracy for the first two test cases is very high.

only one participant reported to have had prior experience with au-tostereoscopic displays.

To monitor possible side-effects, every person started by fillingout a Simulator Sickness Questionnaire (SSQ) [6]. The device washanded to the participant in a demonstration mode and they were in-formed about the display, the optimal viewing angle and distance.After that, they were given some time to familiarize themselveswith the hardware, followed by a description and a dry run of thetask with a simple cube. All of the 40 variations were tested inone sessions, each test separated by a black screen with white textindicating the progress. The duration of the whole session lastedbetween 15 and 20 minutes per participant and was finished withthe second part of the SSQ. Due to this design, the SSQ cannot dis-tinguish between the two display conditions, but it can serve as anindicator of the general visual strain during the session.

5.3 Results

Figure 4 provides and overview on the acquired data points for therelative depth offset between both objects with a boxplot for eachtest case and for both display variants. The center line at 0mmwould be a perfect alignment, while negative values translate tothe virtual object being closer to the viewer, positive values werefurther away.

The first observation is that test conditions same and shadowwere positioned very accurately by all participants. For same thereis almost no difference between the two display conditions with

0

20

40

60

same shadow texture banknote sphere

test case

ab

solu

te d

epth

err

or

(mm

) display

mono

stereo

Figure 5: Mean of the absolute depth error and 95% confidenceinterval. Pairwise comparison between the two display conditionsshows significant improvements for the banknote test case.

a Mean (M) of 3.7mm (Standard Deviation (SD): 21.8mm) in themonoscopic and 3.1mm (SD: 21.5mm) in the stereoscopic condi-tion. And the same is true for test condition shadow at 4.6mm (SD:25.5mm) without and 4.6mm (SD: 21.3mm) with stereoscopy. Thepositioning error of all samples in these two test cases, lies in therange of ±5% of the object distance, except for some outliers.

The mean of the test case texture is also very close to the cen-ter in both conditions (mono: -5.7mm; stereo: -6.5mm), but thestandard deviation is much larger in the first case (mono: 63.8mm;stereo: 52.5mm). The depth judgments for the banknote are proneto underestimation in both display conditions (mono: -21.8mm;stereo: -14.3mm). The test case sphere shows a strong overesti-mation of the distance to the reference object in the stereoscopic(M: 30.0mm, SD: 60.5mm), but not in the monoscopic case (M:2.2mm, SD: 77.6mm). For all of the test cases the standard devia-tion in stereoscopic mode is lower than the monoscopic one. Thiscould already be an indication that it is more precise, although notnecessarily more accurate.

We calculated Spearman’s correlation coefficient to estimate ifthe initial starting position of the virtual object had an effect on thefinal depth of the object. The assumption would be that there isa monotonic relationship between those variables, if any. The lowcorrelation between the depth offset and initial height (ρ =−0.049)or initial depth (ρ = 0.249) respectively, indicates that there is littleinfluence.

To compare positioning accuracy, we computed the absolutevalue of the depth positioning error. The mean values for each testcondition are shown in Figure 5 together with the 95% confidenceinterval based on the standard error. To investigate the effect ofthe different conditions on the depth error, we performed a 2× 5repeated measures ANOVA based on the absolute error. The re-sults indicate that there is a significant main effect of stereoscopyon the depth error (F1,19 = 5.194,p < .05). On average the erroris 5% lower in the stereo case compared to the monoscopic dis-play condition. Further analysis was conducted on the individualtest cases using a paired t-test between the mono and stereo case.This could only show a significant improvement for the test condi-tion banknote (t(79) = 3.56, p < 0.001). In this case mean error isreduced by 32%.

The effects in the other tests were not significant: same (t(79) =1.64,ns), texture (t(79) = 0.54,ns), shadow (t(79) = 1.78,ns) andsphere (t(79) = 1.36,ns).

The mean values for the completion time plotted in Figure 6show that test conditions banknote and sphere were finished morequickly in the stereo condition. Overall the effect of stereoscopic vi-sion is either negligible or tends to increase completion time. Con-

4

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

5000

7500

10000

12500

15000

same shadow texture banknote spheretest case

task

co

mp

leti

on

tim

e (m

s)

display

mono

stereo

Figure 6: Mean value of the task completion time and 95% confi-dence interval in the different test cases. The y-axis is truncated toimprove readability. Overall, the stereo condition seems to be moretime consuming.

sequently, the ANOVA analysis could not show a significant effectof the display condition on the completion time. Using a pairedt-test for the individual test cases, we find a considerable increase(19%) in completion time using stereoscopy for the shadow condi-tion (t(79) = −2.5511, p < .05). During the experiment we couldobserve that the task was usually approached in several phases: In afirst phase, the situation was assessed and the initial position es-timated. This was followed by a rough positioning and severalsmaller refinements.

The results of the SSQ over all test conditions showed an in-crease in eye strain for 45%, blurred vision for 35%, dizziness withclosed eyes for 25% and difficulties to focus for 20% of the sub-jects. There were also individual reports of degradation for the cat-egories dizziness with open eyes, general discomfort, fatigue anddifficulties to concentrate. All stated changes were only one stepfrom none to slight or from slight to moderate.

After the individual sessions of the experiment, several partici-pant spontaneously expressed their opinion regarding the display.Four of them preferred the stereoscopic mode, while seven othersubjects would choose the monoscopic option. Some others de-clared to perceive no difference between the two. There were sev-eral reports of blurred or low resolution images, ghosting, flickeringand low brightness, all of which can be traced back to the parallax-barrier display.

6 DISCUSSION

In this study, we compared accuracy and speed of depth perceptionfor monoscopic and stereoscopic hand-held AR. Five different vir-tual objects had to be aligned with a real object in individual testcases. Each of them provided a unique set of monoscopic depthcues.

Almost all participants were able to complete test cases sameand shadow with minimal positioning error, regardless of the dis-play condition. Apparently, the monoscopic cues were so strong,that the binocular disparity provided no significant improvement.In retrospect, both test cases were very easy to adjust because ofthe distinct edge created by the virtual object standing (or beingprojected) on the table. With this information it was no problem toalign them with the same edge on the real object without the needfor other depth cues. Completion time for test case shadow suf-fered significantly from the stereoscopic condition and took longerto align. One possible reason could be that the subjects neededmore time to adjust to the autostereoscopic display. This could alsobe an indication for confusion resulting from contradicting depthcues.

Only the test case banknote did show a significant improvementof positioning accuracy in the stereoscopic condition. It should benoted that this object did not provide any cues based on linear per-spective or same object size. Out of the nine participant who rankedthe difficulty of the tasks, four named the sphere, three the banknoteand another two voted for the textured cube, as the most difficultobject. This can also be seen in the notably higher error and com-pletion time for these conditions. Test cases sphere and texture didnot show any significant effect of stereoscopy on positioning error.This is similar to the findings of Kerber et al. [7], who investigatedminimal depth discrimination distances between virtual cubes usingthe same hardware. They argue that object size already provides astrong depth cue which is conflicting with binocular disparity.

Regarding the relative positioning offset, we could see varyingdegrees of over- and underestimation of the virtual objects depth.This could be caused by the different colors in the individual testcases and the illumination levels between real and virtual object. Insome conditions, the color red is perceived to be nearer than ad-jacent regions of other color and the same holds true for brighterobjects. As studied by Guibal and Dresp [4], the effect is depen-dent on background, contrast and interaction with other depth cues(e.g. color stereopsis). Since the real object was red and the vir-tual object was brighter, there could have been contradicting cues.In general these interactions should be part of further studies, sincethe background in AR applications is usually dynamic.

In conclusion, our study shows that the use of current autostereo-scopic displays in hand-held AR can only improve depth percep-tion significantly if the scene offers no other dominant depth cues.These results go in line with the the weak fusion model of depth per-ception [11], in which all depth cues contribute to depth perceptionwith different weights. Surprisingly, binocular disparity seems tohave much smaller impact than expected in personal space directlyin front of the user.

One of the major drawbacks in terms of user experience was thedisplay technology. The SSQ showed that several of the participantswere affected by the test procedure, although the effect was minorand could also result from the high level of concentration. We thinkthat diminished resolution and brightness, as well as ghosting couldbe reduced in future hardware generations, but the accommodation-convergence mismatch is not as easy to overcome.

One effect we witnessed during our study, was that the partici-pant, who declared to use a hand-held autostereoscopic game con-sole regularly, performed better than average and also did not showany signs of visual fatigue. We speculate that this could be a sign ofadaption to the technology, but this needs to be evaluated in a longterm study, which was not done to date.

7 CONCLUSION AND FUTURE WORK

In this paper we presented an approach to improve depth perceptionin hand-held augmented reality using autostereoscopic displays.We demonstrated the feasibility to implement such a system on anoff-the-shelf smartphone. The relevance of binocular depth cueswere studied in an experiment conducted with our working proto-type. The results indicate that stereopsis can only lower positioningerror between real an virtual objects significantly if the scene com-position offers only weak monoscopic depth cues.

It should be noted that in most of our test cases other indica-tors for alignment had a much larger influence on the performance.Therefore it seems reasonable to suggest to designers of AR appli-cation, to include different forms of depth indicators. Example forthese are auxiliary augmentations [9], depth blur [15] or artificialholes to overcome occlusion [16]. While coinciding cues shouldreinforce each other, contradictions are not as easily resolved, even-tually leading to an increase in task completion time.

Future studies need to examine other screen types and sizes forhand-held stereoscopic AR, since the current prototype is based on

5

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

a single device. Especially larger displays could increase the effectof stereoscopy. In addition the impact should be compared withother depth cues e.g. color and illumination contrast and also thebackground could have an influence on perception.

ACKNOWLEDGEMENTS

This work was supported in part by the German Federal Ministryof Education and Research (BMBF) as part of the FRAGMENTSproject (grant number 01IS12051).

REFERENCES

[1] K. Copic Pucihar, P. Coulton, and J. Alexander. Creating a stereo-

scopic magic-lens to improve depth perception in handheld aug-

mented reality. In Proceedings of the 15th international conference

on Human-computer interaction with mobile devices and services -

MobileHCI ’13, page 448, New York, USA, Aug. 2013. ACM Press.

[2] J. E. Cutting. Reconceiving perceptual space. In Looking into pic-

tures: An interdisciplinary approach to pictorial space, pages pp.

215–238. MIT Press, 2003.

[3] A. Dey, G. Jarvis, C. Sandor, and G. Reitmayr. Tablet versus phone:

Depth perception in handheld augmented reality. In 2012 IEEE In-

ternational Symposium on Mixed and Augmented Reality (ISMAR),

pages 187–196. IEEE, Nov. 2012.

[4] C. R. Guibal and B. Dresp. Interaction of color and geometric cues

in depth perception: When does “red” mean “near”? Psychological

Research, 69(1-2):30–40, 2004.

[5] J. A. Jones, J. E. Swan II, G. Singh, E. Kolstad, and S. R. Ellis. The

effects of virtual reality, augmented reality, and motion parallax on

egocentric depth perception. In Proceedings of the 5th symposium on

Applied perception in graphics and visualization - APGV ’08, page 9,

New York, USA, Aug. 2008. ACM Press.

[6] R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal. Sim-

ulator Sickness Questionnaire: An Enhanced Method for Quantifying

Simulator Sickness. The International Journal of Aviation Psychol-

ogy, 3(3):203–220, July 1993.

[7] F. Kerber, P. Lessel, M. Mauderer, F. Daiber, A. Oulasvirta, and

A. Kruger. Is autostereoscopy useful for handheld AR? In Proceed-

ings of the 12th International Conference on Mobile and Ubiquitous

Multimedia - MUM ’13, pages 1–4, New York, USA, Dec. 2013. ACM

Press.

[8] E. Kruijff, J. E. Swan II, and S. Feiner. Perceptual Issues in Aug-

mented Reality Revisited. In 2010 IEEE International Symposium on

Mixed and Augmented Reality (ISMAR), pages 3–12, 2010.

[9] M. Kyto, A. Makinen, J. Hakkinen, and P. Oittinen. Improving relative

depth judgments in augmented reality with auxiliary augmentations.

ACM Transactions on Applied Perception, 10(1):1–21, Feb. 2013.

[10] M. Kyto, A. Makinen, T. Tossavainen, and P. Oittinen. Stereoscopic

depth perception in video see-through augmented reality within action

space. Journal of Electronic Imaging, 23(1):011006, Mar. 2014.

[11] M. S. Landy, L. T. Maloney, E. B. Johnston, and M. Young. Mea-

surement and modeling of depth cue combination: in defense of weak

fusion. Vision Research, 35(3):389–412, Feb. 1995.

[12] S. Mangiat and J. Gibson. Disparity remapping for handheld 3D video

communications. In 2012 IEEE International Conference on Emerg-

ing Signal Processing Applications, pages 147–150. IEEE, Jan. 2012.

[13] M. Mehrabi, E. M. Peek, B. C. Wuensche, and C. Lutteroth. Mak-

ing 3D work: a classification of visual depth cues, 3D display tech-

nologies and their applications. In Proceedings of the Fourteenth Aus-

tralasian User Interface Conference (AUIC2013), pages 91–100. Aus-

tralian Computer Society, Inc., Jan. 2013.

[14] M. Mikkola, A. Boev, and A. Gotchev. Relative importance of depth

cues on portable autostereoscopic display. In Proceedings of the 3rd

workshop on Mobile video delivery - MoViD ’10, page 63, New York,

USA, Oct. 2010. ACM Press.

[15] T. Ogawa. Blur with depth: A depth cue method based on blur effect in

augmented reality. In 2013 IEEE International Symposium on Mixed

and Augmented Reality (ISMAR), pages 1–6. IEEE, Oct. 2013.

[16] G. Schall, E. Mendez, E. Kruijff, E. Veas, S. Junghanns, B. Reitinger,

and D. Schmalstieg. Handheld Augmented Reality for underground

infrastructure visualization. Personal and Ubiquitous Computing,

13(4):281–291, June 2008.

[17] G. Singh, J. E. Swan II, J. A. Jones, and S. R. Ellis. Depth judgment

measures and occluding surfaces in near-field augmented reality. In

Proceedings of the 7th Symposium on Applied Perception in Graphics

and Visualization - APGV ’10, page 149, New York, USA, July 2010.

ACM Press.

[18] J. E. Swan II, A. Jones, E. Kolstad, M. A. Livingston, and H. S. Small-

man. Egocentric depth judgments in optical, see-through augmented

reality. IEEE transactions on visualization and computer graphics,

13(3):429–42, Jan. 2007.

[19] D. Wagner and D. Schmalstieg. ARToolKitPlus for Pose Tracking

on Mobile Devices. In Computer Vision Winter Workshop 2007, St.

Lambrecht, Austria, 2007.

6