enabling human–machine interaction in projected virtual environments through camera tracking of...

14
This article was downloaded by: [University of Western Ontario] On: 14 November 2014, At: 18:04 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK International Journal of Human-Computer Interaction Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hihc20 Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers Cesare Celozzi a , Fabrizio Lamberti a , Gianluca Paravati a & Andrea Sanna a a Politecnico di Torino , Turino , Italy Accepted author version posted online: 19 Sep 2012.Published online: 10 Jun 2013. To cite this article: Cesare Celozzi , Fabrizio Lamberti , Gianluca Paravati & Andrea Sanna (2013) Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers, International Journal of Human-Computer Interaction, 29:8, 549-561, DOI: 10.1080/10447318.2012.729455 To link to this article: http://dx.doi.org/10.1080/10447318.2012.729455 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: andrea

Post on 17-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

This article was downloaded by: [University of Western Ontario]On: 14 November 2014, At: 18:04Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Human-Computer InteractionPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/hihc20

Enabling Human–Machine Interaction in ProjectedVirtual Environments Through Camera Tracking ofImperceptible MarkersCesare Celozzi a , Fabrizio Lamberti a , Gianluca Paravati a & Andrea Sanna aa Politecnico di Torino , Turino , ItalyAccepted author version posted online: 19 Sep 2012.Published online: 10 Jun 2013.

To cite this article: Cesare Celozzi , Fabrizio Lamberti , Gianluca Paravati & Andrea Sanna (2013) Enabling Human–MachineInteraction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers, International Journal ofHuman-Computer Interaction, 29:8, 549-561, DOI: 10.1080/10447318.2012.729455

To link to this article: http://dx.doi.org/10.1080/10447318.2012.729455

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

Intl. Journal of Human–Computer Interaction, 29: 549–561, 2013Copyright © Taylor & Francis Group, LLCISSN: 1044-7318 print / 1532-7590 onlineDOI: 10.1080/10447318.2012.729455

Enabling Human–Machine Interaction in Projected VirtualEnvironments Through Camera Tracking of Imperceptible Markers

Cesare Celozzi, Fabrizio Lamberti, Gianluca Paravati, and Andrea SannaPolitecnico di Torino, Turino, Italy

Existing tracking methods designed for interacting withprojection-based displays generally require visible artifacts to beintroduced in the environment in order to guarantee effective sta-bility and accuracy. For instance, in optical-oriented approaches,either the camera sensor or the reference pattern used for trackingare often located within the user’s sight (or interfere with it), thusoccluding portions of the scene or altering the perception of thevirtual environment. Several ways to tackle these issues have beenrecently explored. Proposed approaches basically aim at makingthe presence of tracking references in the virtual space transpar-ent to the user. However, such solutions introduce possibly criticalconstraints on required hardware or environment configuration.In this work, a novel tracking approach based on imperceptiblefiducial markers is proposed. The approach relies on a hiding tech-nique that allows digital images to be embedded in (and retrievedfrom) a projected scene by exploiting the properties of light polar-ization and additive color mixing. In particular, the virtual sceneis obtained by overlapping the light beams of two projectors andby dealing with markers’ hiding via color compensation. A pro-totype setup has been deployed, where interaction with a flatsurface projection environment has been evaluated in terms oftracking accuracy and artifacts avoidance performance by usinga consumer camera equipped with a polarizing filter. Althoughthe performed tests presented in this article represent only apreliminary and a partial evaluation of the proposed approach,they provided encouraging results indicating that the proposedtechnique could be possibly applied in more complex interactionscenarios still with limited hardware requirements.

1. INTRODUCTIONVirtual environments have shown considerable promises as

natural ways to interact with computers by offering impres-sive control possibilities mixed with a compelling sense ofpresence. Over the past years, different interaction approacheshave been proposed, experimenting a wide spectrum of hard-ware devices. These techniques are often tailored to specificvisualization scenarios (head-mounted displays, flat projec-tion surfaces, fully enclosed environments, etc.) and mainly

Address correspondence to Fabrizio Lamberti, Dipartimento diAutomatica e Informatica, Politecnico di Torino, Corso Duca degliAbruzzi 24, Torino 10129, Italy. E-mail: [email protected]

differentiate themselves in the technology used (e.g., optical,acoustic, magnetic, etc.; Meyer, Applewhite, & Biocca, 1992).

Although such approaches proved to be adequate in the exe-cution of a number of tasks, they often present some limitationsin terms of flexibility or performance (Bowman et al., 2008).In fact, depending on the particular approach being consid-ered, specific constraints may apply to both the displayed andthe surrounding environments, and several factors may limita user’s experience while carrying out interaction tasks. Forinstance, when tracking data are embedded (and extracted) froma projected environment by using high-frequency projectors(and synchronized cameras), critical limitations to the framerate are imposed, thus preventing such systems from beingused in active stereo scenarios. When magnetic trackers areused, tracking performance can be severely affected by tran-sient variations in the magnetic field, for example, due to thepresence of emitters as well as of objects moving in the envi-ronment. In the case of camera tracking, either in the visible orin the infrared light domain, sensors or markers are generallyintegrated in the working environment (especially when fullyimmersive systems are considered), thus breaking the continu-ity of the virtual space and contributing at impairing the user’ssense of presence. When optical trackers are used in combina-tion with other sensors (e.g., with depth cameras, like in bodypose-based interaction systems), the need for embedded mark-ers can be relaxed, at the cost of a reduced tracking accuracy.Finally, being tailored to specific domains, the applicabilityof some techniques is sometimes limited by additional costconstraints.

Based on the aforementioned considerations, the aims offorthcoming interaction techniques should be to enable accuratetracking by maximizing system flexibility, by limiting the con-straints posed by technology on the richness of the environment,and by preserving user’s freedom while performing heteroge-neous navigation and manipulation tasks. Moreover, to possiblyextend the reach of such solutions, the goal of having affordablesystems may be additionally taken into account. A possible wayto meet the aforementioned objectives could be to find a designwith reduced requirements in terms of ad hoc hardware, capableof ensuring high robustness and precision by exploiting state-of-the-art tracking techniques while, relying solely on information

549

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 3: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

550 C. CELOZZI ET AL.

available within the virtual environment itself, i.e., basically, onthe visualization surface.

The current work moves in the aforementioned direction,by presenting a technique to embed tracking data directly inthe projection space. By resorting to color compensation tech-niques, the system adjusts the projected images at run-time forhiding fiducial markers into them. Markers can be then extractedwith a handheld consumer camera by relying on the proper-ties of light polarization and exploited for performing opticaltracking of the camera’s location and orientation. The designedtechnique succeeds in the goal of avoiding the presence of intru-sive components or artifacts interfering with the user’s view,by simultaneously removing the need of any specific hardwarecomponents (other than those required for creating commonprojection-based environments). Furthermore, multiple camerascan be used together, e.g., to design collaborative interactionscenarios or track multiple objects moving in the 3D space.Although this work presents a rather conceptual interactionframework without providing concrete results concerning a real3D interaction case study, the experimental results gathereduntil now showed the feasibility of the proposed interactionparadigm. However, to implement a real and effective interac-tion system, further test sessions aiming at the evaluation of thedynamic behavior of the proposed technique are still needed.

The remainder of this article is organized as follows.In section 2, existing techniques designed to enable user inter-action with virtual environments are reviewed by specificallyfocusing on optical-based approaches and on solutions aimedat hiding visual references exploited by the tracking algorithmfrom the user’s sight. In section 3, the basic idea behind thedesigned solution is presented. In section 4, the software andhardware configurations of the proposed tracking architectureare discussed. In section 5, the results of experimental testsaimed at quantifying tracking performance and at identifyingsuitable operating conditions are analyzed in details. Finally, insection 6, conclusions are drawn and future research directionsare outlined.

2. RELATED WORKIn the literature, various solutions for allowing users to inter-

act with virtual environments have been proposed. Though theuse of a wide range of technologies has been investigated,many efforts have been specifically devoted to the optical-basedapproaches, mainly because of their generally higher accu-racy compared with alternative techniques with similar costs(Auer & Pinz, 1999; De Amici, Sanna, Lambert, & Pralio,2009). In optical-based systems, tracking is often accomplishedby trying to identify some kind of known patterns, or markers,which are located in the virtual space. Markers can be eitherlinked to a handheld object or body part that is moved in the vir-tual space and tracked with fixed cameras, or positioned in fixedlocations of the projection environment and used to identifyposition and orientation of a moving optical sensor.

The main differences are in the technologies used for imple-menting the markers. In some situations, physical markers areused. This is the case of Piekarski and Thomas (2002) andWoods, Mason, and Billinghurst (2003), where two slightlydifferent virtual mouselike devices are created by attachingpaper-based fiducial markers to user’s fingertip and user’s hand,respectively. This is also the case of Foxlin and Naimark (2003),where fiducial markers positioned on the ceiling are trackedusing a camera sensor facing upward. An alternative solution isrepresented by infrared tracking systems, where either reflectivepassive markers or active LED light sources can be used as bothdynamic (Pintaric & Kaufmann, 2007) or static (Welch et al.,2001) references. An interesting work that presents some simi-larities with the approach proposed in this article is presented inPark and Park (2010), where invisible markers are created withinfrared inks or powders. However, the main difference is thatin Park and Park (2010) markers are statically printed on article,whereas in the solution proposed in this article, markers are pro-jected. In some cases, ad hoc hardware has been exploited. As amatter of example, a head-mounted device is used to project cal-ibrated laser beams in a specific pattern on the walls of a fullyenclosed virtual reality environment in Vorozcovs, Hogue, andStuerzlinger (2005). Laser dots position is captured by cameraslocated outside the projection environment and used to trackuser’s position and orientation with high accuracy. Finally, thereare situations where markers are projected directly on the visu-alization surface together with the representation of the virtualenvironment itself. For instance, a registration technique allow-ing users to interact with a virtual environment using a mobilephone camera is presented in Pears, Olivier, and Jackson (2008).Camera-display registration is achieved by framing some plainmarkers that dynamically change their position and size whilepartially hiding the virtual scene. Mobile phones can also beused to navigate virtual environments through remote renderingtechniques (Paravati, Celozzi, Sanna, & Lamberti, 2010).

Although the feasibility of these approaches has beendemonstrated in various scenarios, their flexibility is still lim-ited by the fact that either the optical sensors or the referencepatterns introduce visible artifacts in the environment, possi-bly resulting in occlusions or alterations in the user’s view.Hence, significant efforts have been recently focused on theidentification of solutions able to address the, aforementionedconstraints. An attractive approach that has been presentedin Celozzi, Lamberti, Paravati, and Sanna 2011) tackles theissue of hiding artificial patterns by totally getting rid of them.In the designed methodology, natural feature tracking tech-niques, which are generally used to track targets in videosequences (Sanna, Pralio, Lamberti, & Paravati, 2009) or to esti-mate camera pose in scenarios that are mostly time invariant,are exploited for controlling time-variant virtual scenes that areframed by a consumer camera handed by the user. In particular,features computed on the input scene are matched with featuresextracted from the pictures captured by the camera and result-ing transformations are used to compute user’s position and

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 4: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

TRACKING OF IMPERCEPTIBLE MARKERS 551

orientation in the projection space. Although with the approachproposed in Celozzi et al. (2011) artificial patterns can be totallyremoved, the main limitation of such a system, besides com-putational complexity, is that tracking robustness is stronglyrelated to the (dynamic) number of features that can be extractedfrom the scene. Similar considerations also apply to systemscombining data from sensors to track the location of multi-ple points in the environment without using any marker. Thisis the case, for instance, of research works in Hong and Woo(2006); Shen, Ong, and Nee (2011); Gross et al. (2003); andMoore, Duck Worth, Aspin, and Rober (2010), where imagesgathered from one or more cameras are used to reconstructbody motion. This is the case also of recent products for homeentertainment, where interaction is linked to body poses rec-ognized starting from a user’s skeleton. For instance, in theMicrosoft Kinect device, a depth camera is combined with acolor camera to determine the location in the 3D space of upto 20 body joints (Engineering & Technology (2011)). A majordrawback of these system is that tracking accuracy is gener-ally an order of magnitude lower than that of marker-basedapproaches (Khoshelham & Oude Elberink, 2012).

With the aim of preserving tracking stability and accuracy,a more promising direction could be to investigate strategiesallowing markers to be hidden, rather than avoided. An interest-ing approach specifically designed for a six-sided fully enclosedspace is illustrated in Hutson and Reiners (2011), where mark-ers are removed form the user’s sight by positioning thembehind the user himself or herself. A small high-resolution cam-era is worn on the user’s head, facing backward and framingthe markers. In this way, the impact of the tracking systemon the environment is minimized, though part of the virtualspace is used for projecting references only valuable for track-ing. This fact may prevent other possible concurrent users tosee the whole virtual scene. Moreover, the system proposed inHutson and Reiners could not be applied in scenarios wherethe environment does not natively rely on a fully immersivespace. A different approach is discussed in Celozzi, Paravati,Sanna, and Lamberti (2010), where the markers and the sceneare projected with different light polarizations. This way, byusing a polarizing filter in front of the camera, markers canbe easily separated. The main drawback of this approach isthat the user is required to wear special (i.e., polarized) glassesas well. Thus, more complex systems where light polariza-tion is natively exploited (e.g., in passive stereoscopy) couldnot be implemented. Other approaches are presented in WillisPoupyrev, Hudson, and Mahler (2011); Chan et al. (2010),and Grundhofer, Seeger, Hantsch and Bimber (2007), wherefiducial markers are blended in the virtual scene by usinginfrared projectors. Although quantitative performance data arenot available yet for these systems, the main limitations arerelated to the low resolution of the projection hardware andby the marked interferences between natural and infrared light,which make such approaches specifically suitable for very smallinteraction environments.

An alternative solution exploiting specific properties of pro-jectors’ technology is reported in Cotting Naef, Grossy andFuchs (2004). In this work, a method to embed imperceptiblebinary patterns into DLP projections is introduced. To achievethis goal, the authors worked on the micromirror modulationscheme to modify the projected image in such a way that acamera can retrieve the hidden patterns. The major limitationof this technique is that it is strictly bounded by the technologyand by the specific implementation of the modulation algorithm.More general approaches still based on a tight relation betweenprojector and camera are reported in Raskar et al. (1998) andGrundhofer et al. (2007). In these cases, markers were madeinvisible to the human eye via projector-camera synchroniza-tion. More sophisticated solutions concerning the exploitationof ad hoc image coding techniques to take into account per-ceptual aspects of the human vision system are discussed inPark, Lee, S (2007); Park et al. (2008); and Park, Seo, andPark (2010). In these works, complementary patterns are alter-nated in a sequence of images to achieve imperceptibility of thestructured light. According to Park et al. (2010), the main lim-itation of these solutions is that they require special hardwarewith extremely high performance. As a matter of example, syn-chronization is often achieved through trigger signals or flashkeying techniques, thus preventing from the adoption of ordi-nary equipments. Moreover, system setup often clashes withthe requirement of traditional virtual environments, where highframe rates are used to handle, for instance, stereo frames.

By considering strengths and weaknesses of the aforemen-tioned approaches, this article is based on the assumption that amarker-based approach capable of keeping markers in front ofthe user, though removing them from his or her sight by meansof standard hardware, could provide the conceptual frameworkto build an interaction system with higher flexibility in theconfiguration of the virtual space by still ensuring significantperformance.

3. BASIC IDEAIn this work, the design of a marker-based tracking solution

complying with the aforementioned requirements is presented.The proposed technique basically relies on the possibility ofmixing more projection inputs to get a given projection out-put by exploiting the properties of additive color mixing. Theseproperties are well known and, in domains close to the onetackled by this article, they are exploited, for instance, to man-age multiprojector displays: In this context, ad hoc blendingtechniques are designed to handle artifacts generated where raybeams coming from different projectors overlap, which are dueto color nonuniformity across the projection space and betweenprojection devices (Majumder & Stevens, 2004).

However, in the designed tracking approach such techniquesare exploited in a novel way to hide a given image directly in theprojected scene and to make it visible only to one or more opti-cal sensors. Two aligned projectors are used: The first projector

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 5: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

552 C. CELOZZI ET AL.

+

Compensated complementary image

Image to be hidden

Additive color mixing

Polarizer

Polarizer

Polarizer

Retrieved image

FIG. 1. Conceptual scheme of the designed image hiding (and retrieving) technique. Note. Hiding is obtained by using color compensation on complementaryimages. Extraction is achieved by exploiting light polarization (color figure available online).

displays the scene, whereas the second one displays the imageto be embedded. Light coming from the embedded image pro-jector alters, in the general case, both the luminance and thechrominance of the light coming from the scene projector thatis reflected by the screen, thus making the presence of the super-imposed image visible to the user. To compensate this effect, theimage displayed by the scene projector must be altered so thatthe projection resulting from the combination of the two lightsources shows only the scene and appears homogenous.

The general scheme of the hiding technique the designedtracking method has been built upon is reported in Figure 1. Asit can be seen, to extract the embedded image from the resultingprojection, light polarization has to be considered. In particu-lar, two polarizers with different handedness (represented byfilled and empty rectangles in Figure 1) have to be positioned infront of the projection sources. This way, by exploiting polariz-ers with the same handedness of the one used for projecting thescene, camera sensors can retrieve the embedded image fromthe resulting projection. The polarizer in front of the projec-tor displaying the scene is not strictly necessary, but it makes amore clear separation between scene and markers possible.

The proposed approach shares with Park et al. (2010) theidea of exploiting complementary images, though in the presentwork superimposition is achieved in the spatial domain ratherthan in the time domain. This way, the need for ad hoc synchro-nized hardware is definitely relaxed. Moreover, like in Celozziet al. (2010), image extraction is achieved by light polarizingfilters commonly exploited in passive stereoscopy. However, inthis case, the constraint for the user to wear ad hoc glasses isremoved. Hence, the proposed method could be effectively usedin more complex scenarios, e.g., in combination with activestereoscopy-based solutions.

Even though generic images could be possibly handled, forthe specific purpose of performing camera tracking the designedhiding technique has been used in combination with fiducialmarkers. Markers that have been selected are binary, though the

particular gray level used for drawing the background may varyto control hiding performance and tracking results. In this case,the superimposition of the embedded image projection does notalter significantly the chrominance of the scene’s light. Hence,compensation can be performed by treating the color compo-nents separately and by altering their luminance as needed.Specifically, red, green, and blue values of the scene’s pixelscorresponding to a black region in the embedded image have tobe modified to deal with the superimposition of the gray level ofthe patterns’ image background. This is achieved by exploitingso-called compensation maps that, for each color in the sceneimage, define the corresponding color capable of minimizingthe perception of visual artifacts in the resulting projection.

In the following, the architecture of the designed solution isanalyzed in details by considering both the hiding method andthe procedure for the computation of the compensation mapsas well as the application of the map in the overall trackingframework.

4. OVERVIEW OF THE TRACKING SYSTEMThe physical setup that has been exploited to evaluate the

effectiveness of the proposed tracking technique is illustrated inFigure 2.

Two commercial DLP projectors (later referred to as Ps andPm) were configured to project the virtual scene and the mark-ers, respectively. The ray beams projected by Ps and Pm werefiltered by light polarizers; specifically, a left-circular polarizerwas positioned in front of Ps, whereas a right-circular polarizerwas used for Pm. The projection area was obtained by usinga flat screen able to preserve the polarization of projectors’light (though more complex configurations could be adopted).Tracking was experimented by using both a commercial web-cam (Logitech C9051) and a smartphone camera (on the

1http://www.logitech.com/it-it/notebook-products/webcams/devices/5868

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 6: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

TRACKING OF IMPERCEPTIBLE MARKERS 553

Scene projector

Markersprojector

Left-circularpolarizer

Right-circularpolarizer

Non-depolarizing screen Projection area

Camera

Controlsoftware

FIG. 2. Experimental configuration used in this work, consisting of a flat nondepolarizing projection surface and two DLP projectors. Note. Tracking performancehas been evaluated by using consumer cameras (color figure available online).

Samsung Galaxy S device2). A polarizing film with the samehandedness used for Ps was positioned in front of the camerasensor to filter the scene projection beam out and extract onlythe markers.

Though, at run time, tracking basically consists in estimatingcamera pose based on the recovered virtual markers, an off-line preparation phase is actually required to configure the colorcompensation-based hiding technique. Hence, in the following,the procedure for the generation of the compensation maps ispresented first. Then, the organization of the tracking method isdiscussed.

4.1. Compensation MapsTo determine the compensation maps, an ad hoc setup was

prepared. In particular, a PIKE F032C3 camera firmly attachedto a tripod was put in front of the projection area. This cam-era was different from the ones used during tracking, and itwas selected because of the sensor’s quality and the acquisitionspeed. Projectors were calibrated with the Spyder 3 Elite4 toolby adjusting their settings to get comparable white point tem-perature and gamut. An automatic measurement system capableof finding compensation values for tested colors was developed

2http://www.samsung.com/uk/consumer/mobile-devices/smartphones/android/GT-I9000HKDXEU

3http://www.alliedvisiontec.com/emea/products/cameras/firewire/guppy-pro/f-032bc .html

4http://spyder.datacolor.com/product-mc-s3elite.php

and interfaced to the experimental setup. One of the projec-tors was used to display, in the left-half side of the projectionarea, the tested color and, in the right-half side, the correc-tion color. The other projector was configured to display, inthe left-half side, a homogenous gray level (corresponding toembedded image’s background) and, in the right-half side, theblack level (corresponding to embedded image’s foreground).Although black level would theoretically mean no light (i.e.,RGB tern with values [0,0,0]), as indicated in Majumder andStevens (2004), with commercial projectors in correspondenceof the black level there is always some light leakage, referredto as black offset. Nonetheless, in the following, the terms blacklevel and black offset are used with an interchangeable mean-ing, as possible leakages are implicitly taken into account in thedesigned compensation approach.

The task of the automatic measurement system was to findthe correction color that, mixed with the black offset in theright half side of the screen, was capable of compensating thechrominance and luminance of the color resulting from thesuperimposition of the tested color and the gray level on theleft-half side of the projection area.

The proper correction color had to be determined for eachgray level used to draw the background of the embedded imageand for each tern of the RGB space possibly present in a genericscene. However, because of the properties of additive colormixing, the prior task was greatly simplified. In fact, it was suf-ficient to consider luminance variations for the three primarycolors and compute three (independent) compensation maps,each made up of separate curves for each possible gray level

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 7: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

554 C. CELOZZI ET AL.

of the embedded image’s background. The steps for the com-putation of a single compensation curve in the map were asfollows:

• the projected gray level was set to a specific value forthe particular curve;

• the RGB component values of the tested color were setto zero, except for the component the map was beingcomputed for, whose values were varied between 0 and255; and

• for each shade of the tested color obtained from theprevious step, all the possible correction color shadeswere projected, by changing step by step the corre-sponding test component in the range from 0 to 255 andby setting the other components to the value of graylevel selected in the first step.

At each iteration, two shades were displayed on the twohalves of the projection area. To objectively evaluate the degreeof similarity of the resulting shades, a snapshot of the projec-tion area (at a resolution of 640 × 480 pixels) was taken with thecamera, and a fitness function was used to compare the left-handand right-hand sides. It is worth observing that, as underlined inMajumder and Stevens (2004), light is not uniformly distributedin the projection area. Moreover, the optical sensor of the cam-era applies a transformation on the chrominance and luminancevalues, which is not linear. To take into account the aforemen-tioned phenomena, the fitness function d was defined over twoextended regions of the snapshot (symmetrical with respect tothe separation line) as a measure of the distance between theaverage color components of the two regions. In particular, thefollowing equation was used:

d =n∑

i=1

∣∣rl,i − rr,i

∣∣ +n∑

i=1

∣∣gl,i − gr,i

∣∣ +n∑

i=1

∣∣bl,i − br,i

∣∣ (1)

where n is the number of pixels in the considered areas, whereasrl,i, gl,i, bl,i and rr,i, gr,i, br,i are the color component values inthe RGB space for pixels in the left-and right-half sides of theprojection area, respectively. It is worth observing that othermetrics could be used in this step, including subjective ones,which may provide improved results possibly at the expense ofhigh computation times.

The shade minimizing the value of Equation 1 was elected asthe correction color for the corresponding (scene) primary colorand (embedded image’s background) gray level in the compen-sation map. For the sake of brevity, only the compensation map(that is specific for the considered setup) for the blue compo-nent is reported in Figure. 3. The x-axis gives the value of thetested color component, whereas the y-axis depicts the associ-ated compensation value. Each curve corresponds to a differentgray level displayed by the second projector. To improve read-ability, only a subset of curves is shown, corresponding to gray

levels from 0 to 250, Step 10. Labels have been used to high-light curves obtained with some particular gray levels (viz., 0,50, 100, 150, 200, and 250).

The curve for gray level 0 almost corresponds to the bisectorof the quadrant, as the second projector is only displaying theblack offset and the correction color is very similar to the testedone, whereas for gray level 250 the curve is almost a straightline parallel to the x-axis; this is due to the fact that the sec-ond projector is displaying a white light that saturates the lightgenerated by the first projector for all the shades of the testedcomponent. Small fluctuations around the ideal trend derivefrom errors introduced in the projectors’ calibration process andin the estimation of the fitness function, whereas nonlinearityis mainly due to the projector’s technology being considered.In fact, DLP projectors have a white segment that is used toproduce brighter gray tones. The amount of light associatedwith this segment is a nonlinear function of the particular RGBcomponent being projected. In the particular configuration usedfor generating the compensation maps, this phenomenon affectsonly the left-half side of the projection area (where the testedcolor and the gray level are superimposed). This asymmetryfinally contributes at producing the hollows that are visible inFigure. 3.

4.2. Tracking ProcessAccording to the methodology introduced in section 3, com-

pensation maps are used to cope with the presence of dynamicvirtual markers that are embedded at runtime by the tracking

FIG. 3. Compensation map for the blue component obtained with the consid-ered setup. Note. Curves (and labels) correspond to possible gray levels used fordrawing the background of the embedded image (color figure available online).

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 8: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

TRACKING OF IMPERCEPTIBLE MARKERS 555

system while the user interacts with the scene. The marker-based tracking method has to deal with two main critical points:the lighting changes in the environment due to the projectionof images with time-varying luminance and the presence ofocclusion-like phenomena linked to residual light coming fromthe scene projector in the markers image recovered by the cam-era, which are due to the nonideality of the polarizers andof the projection surface. Both phenomena prevent the systemfrom retrieving fiducial markers with stable luminance and neatborders.

To address these issues, the ARTag system has been adopted(Fiala, 2005). ARTag is a fiducial marker-based solution thatexploits digital coding theory to get a very low intermarkerconfusion and false positive rate by adopting an ad hocedge detection method that provides impressive robustness interms of immunity to lighting variations and occlusion andsignificant tracking accuracy. For this, ARTag markers havebeen already used in a number of heterogeneous scenarios,ranging from augmented virtual reality (Billinghurst & Kato,2002; Schmalstieg & Wagner, 2008) to synthetic environment-oriented human–machine interaction solutions (Celozzi et al.,2010; Hutson & Reiners, 2011).

In the designed architecture, ARTag markers are handled bymeans of the ARToolKitPlus libraries (Wagner & Schmalstieg,2007), which are used to both generate and detect the mark-ers as well as to estimate the camera’s pose. At each frame,based on information obtained at the previous frame, the systemfirst determines the grid size capable of maximizing trackingaccuracy given the distance of the camera from the projectionsurface (section 5.1). Then, the appropriate gray level to be usedfor drawing the background of the markers image is selected.

This choice is of paramount importance: In fact, becausemarkers’ foreground color is always set to the black offset,the background gray level determines the overall contrast ofthe markers’ image that will be processed by the camera poseestimation algorithm (section 5.2). Experimental observationsshowed that the higher the contrast ratio, the more accurate thetracking results: hence, the goal would be to work with highgray levels.

However, this goal clashes with an intrinsic property of thedesigned hiding technique. Looking at the compensation map inFigure 3, it could be seen that all the curves show an asymptotictrend. In particular, after a given value of the tested color com-ponent, the correction value saturates to 255. This means thatpixels of the scenes having RGB component values falling in thesaturation region would not be compensated correctly. In addi-tion, as illustrated in Figure 4, experimental tests showed a lessaccurate compensation behavior for high gray levels. Hence, aslong as the gray level increases, the saturation point moves tothe left, resulting in an larger saturating region where visibleartifacts may be possibly generated in the projection area.

Therefore, a trade-off between hiding performance andtracking accuracy has to be found. Experimental tests werecarried out to characterize the relation between the previousfactors, i.e., to evaluate tracking performance with respect to

perceived scene’s quality. In this way, a suitable operative range(represented by a constrained set of gray levels) for the specificsetup considered in this work was identified. The particular graylevel in the previous range is determined by considering user’spriorities.

Based on the selected gray level, markers are generatedand the color gamut of the complementary scene is stretched(normalized) in such a way that only nonsaturating colors areactually present in the image. Finally, for each pixel in the nor-malized image, the RGB components are corrected using thecompensation curves for the specific gray level. The markers’embedding process is schematized in Figure 5.

The software responsible for the camera sensor is designedto continuously deliver framed snapshots to the ARToolKitPlus-based module. Given the fact that image capture is medi-ated by a polarizing film, the ARToolKitPlus libraries candirectly exploit incoming frames for estimating sensor’s posi-tion and orientation in the world’s coordinates. Camera posedata obtained from individual frames are given in input to aKalman filter, which provides the rendering subsystem withinformation required for updating the scene view in a synchro-nized way.

5. EXPERIMENTAL RESULTSPerformance of the proposed tracking technique is strongly

related to configuration of the markers image embedded inthe projected scene. On one hand, according to Wagner andSchmalstieg (2007), stability and accuracy of ARTag-basedtracking are influenced by the size and number of visiblemarkers; moreover, as anticipated in the previous section, toensure the best results in the detection process, the embeddedimage should be characterized by the highest contrast. On theother hand, a trade-off is established between tracking perfor-mance and perceived scene’s quality (in terms of hiding results).Therefore, experimental tests were carried out to evaluate theeffectiveness of the designed tracking system and to identifyoperative conditions capable of maximizing user’s expectations.

5.1. Markers’ LayoutAs illustrated in section 4, tracking is performed by estimat-

ing the camera’s pose based on virtual fiducial markers embed-ded in the projected scene. With the ARTag system, largermarkers are recognized with a higher accuracy. Nevertheless,by using a larger number of smaller markers, the pose esti-mation algorithm in ARToolKitPlus libraries may be providedwith a higher number of corners and improved stability can beachieved; furthermore, having more markers would increase theprobability that, during interaction, the camera is able to seeat least a whole marker (i.e., tracking is not lost). However, asillustrated in Figure 6, when markers become too small the sys-tem may be unable to detect them all: In this case, accuracymight decrease.

Because during interaction markers’ size actually dependson the distance between the camera and the projection surface,

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 9: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

556 C. CELOZZI ET AL.

FIG. 4. Hiding performance for different gray levels used to draw markers’ background (20, 40, 60, 80, 100, and 240) (color figure available online).

+

Markers image

Scene image Normalized image Compensated complementary image

Markers image with proper gray level

Mixed image

Polarizer

Polarizer

FIG. 5. Scheme of the process adopted for hiding markers in the virtual scene: For a given gray level used for drawing the markers, a normalized scene image isgenerated and color correction is used to compensate the contribution of bright regions in the embedded pattern (color figure available online).

1/1 4/4 9/9 16/16 25/25

36/36 49/49 51/64 45/81 2/100

FIG. 6. Number of markers detected as a function of the grid size (varying between 1 × 1 and 10 × 10) (color figure available online).

a number of experimental tests were carried out to measurethe tracking error for different markers’ layouts as a functionof the position of the camera in the environment. In particular,grid layouts were considered, though alternative configurationsmay be investigated. Tests were performed with the Logitech

webcam (configured for capturing images at a resolution of640 × 480 pixels and calibrated through the Matlab CalibrationToolbox) by considering a 1.5 × 1.2 meters wide projectionarea. Markers’ background was set to white to have the max-imum contrast in the embedded image. The camera was firmly

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 10: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

TRACKING OF IMPERCEPTIBLE MARKERS 557

attached to a tripod, oriented perpendicularly to the projectionarea and positioned at three known distances measured witha laser meter. For each distance (viz., 1, 1.5, and 2 m), themean static positional error was determined for increasing gridsizes (ranging from 1 to 10 markers per edge). Measures wereexpressed in millimeters and their accuracy was estimated interms of RMSE. The trend of the RMSE for the z dimension(orthogonal to the projection surface) at the distance of 2 m isreported in Figure. 7.

As it can be seen, a local minimum corresponding to aparticular grid size can be identified for the considered dis-tance. For instance, at 2 m, the lowest RMSE (correspondingto 0.87 mm) was measured with a 4 × 4 grid. Based on thisprocedure, the grid sizes providing the highest accuracy havebeen identified for the three considered distances. Results arereported in Table 1. It is worth observing that, at lower dis-tances (i.e., when the whole camera view was mostly filled withmarkers), an higher accuracy was experienced. As a matter ofexample, at a 1-m distance, an RMSE of 0.48 mm was obtained.The aforementioned trend was confirmed also by preliminaryexperimental observations on the mean rotational error that,in the considered configurations, was approximately of 1.59degrees.

FIG. 7. RMSE error in the direction orthogonal to the projection area for gridswith 1 to 10 markers per edge (viewed from 2 meters) (color figure availableonline).

TABLE 1Grid Sizes Providing the Highest Accuracy

Considered the Distance of the Camera Fromthe Projection surface

Distance (mm) No. of Markers per Edge

1,000 81,500 72,000 4

As already said, based on the previous results, the sys-tem was configured to dynamically adjust the grid size to beprojected by considering the position of the camera in theinteraction environment.

5.2. Markers’ Gray LevelAs anticipated, with the designed hiding technique, the con-

trast in embedded markers image introduces a trade-off betweentracking accuracy and hiding performance. In fact, by usinglow gray levels for drawing markers’ background, a narrowsaturation zone is obtained and the compensation technique isable to minimize the presence of visual artifacts in the dis-played scene. However, when extremely low contrast imagesare used, extracting reliable markers becomes quite an hardtask. This phenomenon is illustrated in Figure. 8, where thenumber of detected markers as a function of the gray level isshown.

With the aim of quantifying the impact of the gray levelselected for drawing markers’ background on tracking accu-racy, experimental tests were performed to measure the RMSEwhile varying the gray-level percentage from 0% to 100%.Measurements were performed with the setup discussed insection 4.1, by enabling the auto-threshold option of the track-ing system, by disabling the automatic white balancing of thecamera, and by setting the gain and exposure time at fixed val-ues. As it has been done for determining the optimal markers’layout, experiments were repeated at three known distances.RMSE estimated in the z direction with the camera positionedat 1 m from the projection surface is reported in Figure 9.As it can be seen, when contrast is under a certain threshold(generally corresponding to a gray-level percentage lower than30%), significantly higher RMSE are experienced. Moreover,with gray-level percentages larger than 90%, markers’ detectionis penalized by the high screen brightness. Hence, an interval forthe gray level to be used for generating markers image capableof providing tracking results comparable to those of alternativesolutions is obtained.

Taking into account the previous results, experimental testswere then carried out to assess the impact of background’sgray level on the ability of the designed method to hide vir-tual fiducial markers in the overlapped projection. The camerawas placed at a 1-m distance from the screen. Several refer-ence images taken from the literature (commonly known asLena, Peppers, Boat, Mandrill, and House5) were used as virtualscenes to obtain comparable results. The gray-level percent-age was varied again between 0% and 100% and capturedsnapshots with and without embedded markers were comparedto evaluate the visual impact of possible artifacts. To obtaina quantitative indication of hiding performance, a numeri-cal metric based on the universal image quality index (UIQI)was used (Zhou & Bovik, 2002). For each gray level, three

5http://sipi.usc.edu/database/database.php?volume=misc

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 11: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

558 C. CELOZZI ET AL.

1/36 6/36 13/36 20/36 25/36

29/36 33/36 34/36 35/36 36/36

FIG. 8. Number of markers detected as a function of the gray level used for drawing markers’ background (gray level percentages between 10% and 100% areshown) (color figure available online).

FIG. 9. RMSE error in the direction orthogonal to the projection area for back-ground’s gray level percentages varying between 10% to 100% (at a 1-meterdistance) (color figure available online).

pictures were taken by the camera: Two pictures framed thesuperimposition of the projected scene with a uniform back-ground corresponding to the selected gray level, and a thirdpicture framed the scene after the application of the hidingtechnique. To take into account the noise introduced by the opti-cal sensor, two different indexes were computed using thesepictures:

• UIQI-H, measuring the differences between the secondand the third picture due to the embedding of virtualmarkers; and

• UIQI-N, measuring the differences between the firstand the second picture due to the noise introduced bythe camera.

The performance of the hiding method was then evaluatedby considering the UIQI-H/ UIQI-N ratio. When the scene isnot altered in a significant way, UIQI-H and UIQI-N should bevery similar and a ratio close to 1 is expected. When visible

FIG. 10. UIQI-H/UIQI-N ratio on three test images (color figure availableonline).

artifacts are introduced, the value of UIQI-H is much smallerthan UIQI-N and the ratio decreases. In Figure 10, the trend ofthe UIQI-H/ UIQI-N ratio for three of the considered images isreported.

As the percentage of gray level increases, the value of theratio UIQI-H/ UIQI-N decreases. This is due to the fact that forhigh gray levels, the compensation maps provide correction val-ues which are less effective, because, for high luminance levelsslight errors introduced in the computation of the compensationmaps imply a high divergence between the luminance valuesof the two projectors. Moreover, with high gray levels (i.e.,with large saturation zones), the contrast of the resulting pro-jection decreases; this implies a reduction of the high-frequencycomponents in the image, which increases the perceptibility ofluminance differences.

By comparing quantitative indications obtained by applyingthe previous numerical method with perceptive results basedon subjective observations, an operative interval for the UIQImetric able to ensure the absence of visible artifacts in theconsidered images was found. Specifically, when the UIQI-H/

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 12: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

TRACKING OF IMPERCEPTIBLE MARKERS 559

FIG. 11. Hiding performance on the Lena test image for increasing gray-level percentages (10% to 30% for the first three rows, 50% for the last row). Note.A Virtual scene resulting from the two projections (as seen through the polarizing filter in front of the camera) is shown in the third column (color figure availableonline).

UIQI-N ratio was between 1 and 0.9, markers could not be visu-ally detected by selected subjects. This translated into an upperbound for gray-level percentage roughly corresponding to 40%.

All these considerations are illustrated in Figure 11. Thefirst column shows the markers image to be embedded. Thesecond and the fifth columns report the images projected byPs and Pm as seen by the camera (with polarizing filters ofthe suitable handedness). The third column shows the effectobtained by mixing the two beams. Finally, the fourth columnreports the ideal image (i.e., with no artifacts) obtained by pro-jecting the uncompensated scene with Ps and a uniform graybackground with Pm. Each row reports images correspondingto a different gray level. The first three rows reports imagescorresponding to gray levels that are below the upper boundidentified above (specifically corresponding to 10%, 20%, and30% gray-level percentages, respectively). In the fourth row,results obtained with a 50% gray-level percentage are depicted.As said, with this configuration embedded markers becomevisible.

It is worth observing that images used in the experimentaltests may be unable to re-create all kind of situations that mayoccur in practical scenarios. Moreover, from Figure 11 it is alsoquite evident that, with high gray levels, the saturation zonesin the compensation maps grows and the set of colors that canbe used is reduced. Hence, a more conservative bound to gray-level percentage equal to 30% should be considered with theparticular setup used in this work.

In summary, by identifying a set of configurations (eachone described in terms of the gray-level percentage to be usedfor drawing markers’ background) ensuring effective hidingperformance, a corresponding range for tracking accuracy isfound. As a matter of example, in the gray-level percentagerange between 30% and 40% and at a 1-meter distance, theRMSE in the z dimension for the considered setup variedbetween 5.93 mm and 3.62 mm.

In the designed architecture, the specific gray level to be usedwithin a specific scenario can be selected based on the partic-ular characteristics of the scene and on the priority assignedby the user to scene quality and tracking accuracy. It is worthobserving that, depending on the specific scene being consid-ered, higher gray levels could be possibly used without alteringscene’s perception significantly but achieving higher accuracy.

6. CONCLUSIONSIn this work, a novel hiding technique relying on color

compensation and light polarization is exploited to design ahuman–machine interaction approach based on camera trackingof imperceptible fiducial markers. A prototype setup has beendesigned using a pair of projectors, a webcam (or an equivalentdevice like a smartphone camera), and a flat projection surface.With respect to comparable optical-based tracking approaches,the designed technique aims at getting rid of any visible arti-fact in the scene to improve the user’s sense of presence.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 13: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

560 C. CELOZZI ET AL.

Compared to alternative solutions, the proposed architecturedoes not require any special hardware and it can be exploitedto design more complex environments with a higher degree offlexibility.

According to the experimental tests that have been carriedout to evaluate system’s behavior, the use of the proposed hid-ing technique imposes a trade-off between the quality of theprojected scene and the precision of tracking results. With theconsidered configuration, an operative range has been identi-fied where satisfactory tracking accuracy and robustness withoptimal hiding performance can be obtained.

Nonetheless, further activities will have to be devoted toassess tracking performance under dynamic conditions by usingan external reference. In this context, optimized strategies totune system’s variables at runtime and continuously providingthe user with the best interaction experience in any scenario willbe investigated.

Moreover, there exist several factors that may be consideredto further improve system’s effectiveness. A crucial factor isrepresented by projectors’ alignment. In fact, slight misalign-ments in the two overlapped projections could introduce visibleartifacts in the resulting mixed image. In this work, ordinaryprojectors have been used and alignment has been achievedby using a trivial minimization algorithm that considers pointmatches and line matches between projected images. In thefuture, more sophisticated hardware setups and more accuratealignment techniques may be considered (Majumder & Stevens,2004; Hereld, Judson, & Stevens, 2002). A further factor thatcould be investigated concerns interprojector differences andintraprojector variations. In this work, the two factors are takeninto account in the computation of the compensations mapsby adopting a preliminary calibration phase and by exploitinga region-based metric to compare projected shades. However,more accurate results could be achieved by exploiting differentcompensation maps taking into account spatial inhomogeneitiesin projection light sources to correct pixels belonging to dif-ferent regions of the projection surface; even a pixel-basedcorrection may be considered in the future, by taking intoaccount both perceptual and complexity constraints.

In summary, the approach presented in this article is stillconceptual and requires more complete tests to prove the realpossibilities to be used as an effective interaction system. In thedynamic scenario, further tests should also be performed toassess the precision and the accuracy of the tracking system.

REFERENCESAuer, T., & Pinz, A. (1999). The integration of optical and magnetic tracking

for multi-user augmented reality. Computer & Graphics, 23, 805–808.Billinghurst, M., & Kato, K. (2002). Collaborative augmented reality.

Communications of the ACM, 45(7), 64–70.Bowman, D.A., Coquillart, S., Froehlich, B., Hirose, M., Kitamura, Y.,

Kiyokawa, K., & Stuerzlinger, W. (2008). 3D user interfaces: New direc-tions and perspectives. IEEE Computer Graphics and Applications, 28(6),20–36.

Celozzi, C., Lamberti, F., Paravati, G., & Sanna, A. (2011). Controllinggeneric visualization environments using handheld devices and natu-ral feature tracking. IEEE Transactions on Consumer Electronics, 57,848–857.

Celozzi, C., Paravati, G., Sanna, A., & Lamberti, F. (2010). A 6-DOF ARTag-based tracking system. IEEE Transactions on Consumer Electronics, 56(1),203–210.

Chan, L.W., Wu, H.T., Kao, H.S., Ko, J.C., Lin, H.R., Chen, M. Y., . . .,Hung, Y.P. (2010). Enabling beyond-surface interactions for interactive sur-face with an invisible projection. Proceedings of the 24th Annual ACMsymposium on User Interface Software and Technology, 263–272.

Cotting, D., Naef, M., Gross, M., & Fuchs, H. (2004). Embedding impercepti-ble patterns into projected images for simultaneous acquisition and display.Proceedings of the 3rd IEEE/ACM International Symposium on Mixed andAugmented Reality, 100–109.

De Amici, S., Sanna, A., Lamberti, F., & Pralio, B. (2009). A Wii remote-based infrared-optical tracking system. Entertainment Computing, 1(3–4),119–124.

Fiala, M. (2005). ARTag, A fiducial marker system using digital techniques.Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 590–596.

Foxlin, E., & Naimark, L. (2003). VIS-Tracker: A wearable vision-inertialself-tracker. Proceedings of the IEEE Conference on Virtual Reality,199–206.

Gross, M., Wrmlin, S., Naef, M., Lamboray, E., Spagno, C., Kunz, A., . . .,Staadt, O. (2003). Blue-c: A spatially immersive display and 3D video portalfor telepresence. Proceedings of the ACM SIGGRAPH 2003, 819–827.

Grundhofer, A., Seeger, M., Hantsch, F., & Bimber, O. (2007). Dynamic adap-tation of projected imperceptible codes. Proceedings of the 6th IEEE/ACMInternational Symposium on Mixed and Augmented Reality, 181–190.

Hereld, M., Judson, I.R., & Stevens, R. (2002). Dottytoto: A measurementengine for aligning multi-projector display systems. Projection displays IX,5002, 73–86.

Hong, D., & Woo, W. (2006). A 3D vision-based ambient user interface.International Journal of Human-Computer Interaction, 20(3).

Hutson, M., & Reiners, D. (2011). JanusVF: Accurate navigation using SCAATand virtual fiducials. IEEE Transactions on Visualization and ComputerGraphics, 17, 3–13.

Khoshelham, K., & Oude Elberink, S. (2012). Accuracy and resolution ofKinect depth data for indoor mapping applications. Sensors, 12, 1437–1454.

Majumder, A., & Stevens, R. (2004). Color nonuniformity in projection-baseddisplays: Analysis and solutions. IEEE Transactions on Visualization andComputer Graphics, 10, 177–188.

Meyer, K., Applewhite, H.L., & Biocca, F.A. (1992). A survey of positiontrackers. Presence, 1, 173–200.

Moore, C., Duckworth, T., Aspin, R., & Roberts, D. (2010). Synchronization ofimages from multiple cameras to reconstruct a moving human. Proceedingsof the IEEE/ACM 14th International Symposium on Distributed Simulationand Real Time Applications, 53–60.

Paravati, G., Celozzi, C., Sanna, A., & Lamberti, F. (2010). A feedback-basedcontrol technique for interactive live streaming systems to mobile devices.IEEE Transactions on Consumer Electronics, 56(1), 190–197.

Park, H., Lee, M.-H., Seo, B.-K., Jin, Y., & Park, J.-I. (2007). Content adap-tive embedding of complementary patterns for nonintrusice direct-projectedaugmented reality. Proceedings of the 2nd International Conference onVirtual Reality, 132–141.

Park, H., Lee, M. H., Seo, B.-K., Park, J.-I., Jeong, M.-S., Park, T.-S.,Lee, Y., & Kim, S.-R. (2008). Simultaneous geometric and radiometricadaptation to dynamic surfaces with a mobile projector-camera system.IEEE Transactions on Circuits and Systems for Video Technology, 18,110–115.

Park, H., & Park, J.I. (2010). Invisible MarkerBased augmented reality.International Journal of Human-Computer Interaction, 26(9), 829–848.

Park, H., Seo, B.K., & Park, J.I. (2010). Subjective evaluation on visual percep-tibility of embedding complementary patterns for nonintrusive projection-based augmented reality. IEEE Transactions on Circuits and Systems forVideo Technology, 20, 687–696.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14

Page 14: Enabling Human–Machine Interaction in Projected Virtual Environments Through Camera Tracking of Imperceptible Markers

TRACKING OF IMPERCEPTIBLE MARKERS 561

Pears, N.E., Olivier, P., & Jackson, D. (2008). Display registration for deviceinteraction. Proceedings of the 3rd International Conference on ComputerVision Theory and Applications. pp. 446–451.

Piekarski, W., & Thomas, B. (2002). Using ARToolKit for 3D hand positiontracking in mobile outdoor environments. Proceedings of the First IEEEInternational Workshop on Augmented Reality Toolkit.

Pintaric, T., & Kaufmann, H. (2007). Affordable infrared-optical pose track-ing for virtual and augmented reality. Proceedings of the IEEE VRWorkshop on Trends and Issues in Tracking for Virtual Environments,44–51.

Raskar, R., Welch, G., Cutts, M., Lake, A., Stesin, L., & Fuchs, H. (1998).The office of the future: A unified approach to image-based modeling andspatially immersive displays. Proceedings of the 25th Annual Conferenceon Computer Graphics and Interactive Techniques, 179–188.

Sanna, A., Pralio, B., Lamberti, F., & Paravati, G. (2009). A novel ego-motion compensation strategy for automatic target tracking in FLIRvideo sequences taken from UAVs. IEEE Transactions on Aerospace andElectronic Systems, 45, 723–734.

Schmalstieg, D., & Wagner, D. (2008). Mobile phones as a platform foraugmented reality. Proceedings of the IEEE VR Workshop on SoftwareEngineering and Architectures for Realtime Interactive Systems, 43–44.

Shen, Y., Ong, K., & Nee, Y.C. (2011). Vision-based hand interaction in aug-mented reality environment. International Journal of Human–ComputerInteraction, 27(6).

The teardown: The Kinect for Xbox 360. (2011). Engineering & Technology,6 (3).

Vorozcovs, A., Hogue, A., & Stuerzlinger, W. (2005). The Hedgehog: A noveloptical tracking method for spatially immersive displays. Proceedings of theIEEE Conference on Virtual Reality, 83–89.

Wagner, D., & Schmalstieg, D. (2007). ARToolKitPlus for pose tracking onmobile devices. Proceedings of the 12th Computer Vision Winter Workshop.pp. 139–146.

Welch, G., Bishop, L., Vicci, L., Brumback, S., Keller, L., & Colucci,D. (2001). High-performance wide-area optical tracking: The HiBalltracking system. Presence: Teleoperators and Virtual Environments, 10,1–21.

Willis, K.D.D., Poupyrev, I., Hudson, S.E., & Mahler, M. (2011). SideBySide:Ad-hoc multi-user interaction with handheld projectors. Proceedings of the24th Annual ACM symposium on User Interface Software and Technology,431–444.

Woods, E., Mason, P., & Billinghurst, M. (2003). MagicMouse: An inex-pensive 6-degree-of-freedom mouse. Proceedings of the 1st InternationalConference on Computer Graphics and Interactive Techniques inAustralasia and South East Asia, 285–286.

Zhou, W., & Bovik, A.C. (2002). A universal image quality index. IEEE SignalProcessing Letters, 9, 81–84.

ABOUT THE AUTHORSCesare Celozzi received his M.Sc. degree and Ph.D. degreein Computer Engineering from Politecnico di Torino, Italy,in 2004 and 2012. He is currently a research fellow atthe Dipartimento di Automatica e Informatica, Politecnico diTorino. His research interests include image processing, virtualreality, and 3D user interfaces.

Fabrizio Lamberti received his Ph.D. degree in ComputerEngineering in 2005 from Politecnico di Torino, Italy. Since2006 he has been an assistant professor at Politecnico di Torino.He has published a number of technical papers in internationaljournal and conferences in the areas of computer graphics, HCI,and visualization.

Gianluca Paravati received his Ph.D. degree in ComputerEngineering from Politecnico di Torino, Italy, in 2011. Heis a research assistant with the Dipartimento di Automaticae Informatica at Politecnico di Torino. His research interestsinclude image processing, computer graphics, and distributedarchitectures.

Andrea Sanna received his Ph.D. degree in computer engi-neering in 1997 from Politecnico di Torino, Italy. Currently,he is an associate professor position at the First Faculty ofArchitecture of Politecnico di Torino, Italy. He has authored andcoauthored several papers in the areas of computer graphics andvirtual reality.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

este

rn O

ntar

io]

at 1

8:04

14

Nov

embe

r 20

14