kinect sensor interference measurement · 2.1 three-dimensional imaging the light to hit an object,...

Institut für Werkzeugmaschinen & Fertigung

Kinect Sensor InterferenceMeasurement

Luzius Brogli

Bachelor Thesis29.01.2016

Ali Alavi

PD Dr. habil. Andreas Kunz

Abstract

This thesis presents an investigation of possible interference effects on the calculated depth informationfrom a Kinect for Xbox One, that may arise when using multiple sensors simultaneously. Additionally,the working principles of the sensor are described, as well as its main properties and an analysis of theaccuracy and precision of the sensors depth information in an undisturbed environment is carried out.The results show a linear relation between the error in the depth information and the distance from anobject, with an accuracy of about one centimeter for distances up to 4.5 meters and good precision.An irregularly occurring interference between two sensors was discovered, which affected the depthinformation of the sensor, but which was not consistently reproducible. This interference behaviour wasexceptional, whereas most of the time no interference that affected the calculated depth information wasobservable when two sensors were active simultaneously.

i

Kinect Sensor Interference MeasurementKeywords: Depth camera; Computer Vision; Metrology

Content of the ThesisFor a European research project, these sensors should be used to track and identify users in a room. For this, multiple Kinects should be used in order to cover the whole interaction field. However, it is yet not clear if multiple sensors could be used simultaneously, due to possible interference and hence a drop in the reliability of the acquired data.Within this work, the student should plan and realize a setup using two XBOX One Kinect sensors. Within this setup, different combinations of sensor-to-sensor and object-to-sensor positioning should be tested. Under each condition, multiple measurements should be taken, followed by a thorough statistical evaluation. The achieved results should then be compared to measurements with a single-sensor setup in order to make well-funded decisions if and to which extent a multiple-sensor setup affects the measurement accuracy and precision. Finally, the work should be presented in form of written report and oral presentation.

Information & AdministrationAli Alavi, LEE L201 – [email protected]

Andreas Kunz, LEE L208 – [email protected]

• Literature study on depth resolution measurement

• Requirement analysis and planning of measurement setup

• Installation and verification of measurement setup

• Intermediate presentation

• Development of a data capturing & evaluation program

• Performing measurements using single and multiple sensors under various conditions

• Statistical evaluation of the achieved data

• Specifying the impact of multiple Kinect on measurement precision

• Written report

• Final presentation

Semester Thesis HS 2015 Publ. 28.09.2015

IntroductionIn recent years different dpeth cameras have been introduced to market, mainly targeted towards entertainment applications. However, the precision and reliability of such hardware makes it suitable for use in other application fields. The goal of this thesis is a thorough investigation of depth resolution and mutual interference of multiple XBOX One Kinect depth sensing cameras.

Requirements

To be considered for this thesis, you should have

• Programming skills, preferably in C#.

• Strong communication and interpersonal skills

• Experience in metrology, computer graphics or computer vision is a plus

Work Packages

Acknowledgement

Firstly I would like to thank Mr. Ali Alavi for his support and guidance during the making of this thesis.Not only for supporting me in scientific problems, but also for giving me excellent advice on how tostructure and write a thesis.I would also like to thank PD Dr. habil. Andreas Kunz for the chance to work on this topic and forproviding all the necessary facilities required for this thesis.I am grateful to my fellow students Lukas Matthys and Ivo Caduff for helping me during my experiments,reading my thesis and offering advice on any problems.Last, but not least I also want to thank my family for the support and help they offered me during thework on this thesis.

iii

Contents

List of Figures viii

List of Tables ix

1 Introduction 1

2 Literature Study 32.1 Three-Dimensional Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Structured Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.3 Laser Scanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.4 Time-of-Flight Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Accuracy And Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Systematic Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3 Random Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Measurement Setup 93.1 Tools and Preliminary Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Kinect for Xbox One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.3 Data Capturing Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.4 Depth Measuring Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Single Sensor Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.1 Determining The Reference Distance . . . . . . . . . . . . . . . . . . . . . . . 143.2.2 Error Sources and Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Interference Measurement Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

v

Contents

4 Results and Discussion 214.1 Single Sensor Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Interference Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Conclusion and Future Work 295.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Bibliography 32

vi

List of Figures

2.1 The Kinect for Xbox 360 Sensor [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Working-principle of time-of-flight cameras. Left: Continuous Wave Based. Right:

Pulse Wave based. [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 The Kinect for Xbox One Sensor. ifixit.com . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Measurement target (left). Kinect and distance measuring device in single sensor setup

(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Target used to determine reference distance. . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Reflection of the infra-red emitter. Infra-Red image from the Kinect 2 (left). Depth

image from the Kinect 2 (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Interference measurement setup. Setup with sensors facing the same target, 0◦- 90◦ (left).

Setup with sensors facing each other, 90◦- 180◦ (right). . . . . . . . . . . . . . . . . . . 183.6 Setup to determine the angle between the two senors. Setup with sensors facing the same

target, 0◦- 90◦ (left). Setup with sensors facing each other, 90◦- 180◦ (right). . . . . . . . 19

4.1 Difference between the reference distance and the distance measured by the Kinect 2 asa function of the reference distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Standard deviation of the mean depth between individual frames calculated over 100frames as a function of the reference distance. . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Average standard deviation calculated over 100 frames. . . . . . . . . . . . . . . . . . 234.4 First measurement - Difference between distance measured with and without a second

sensor active. Measured at different angles between the two sensors . . . . . . . . . . . 254.5 Second measurement - Difference between distance measured with and without a second

sensor active. Measured at different angles between the two sensors . . . . . . . . . . . 264.6 First measurement - Difference in standard deviation between individual frames with and

without a second sensor active. Measured at different angles between the two sensors . . 264.7 Interference causing changing areas with no depth information around second sensor. . . 27

vii

List of Figures

4.8 Interference occurrence. Standard deviation over individual frames for different positionsof the two sensors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

viii

List of Tables

3.1 Comparison of specifications between the Kinect 1 and Kinect 2[3],[4] . . . . . . . . . . 113.2 Technical data of the Bosch DLE 40 rangefinder. From device instructions . . . . . . . . 13

ix

1Introduction

In 2010 the Kinect for Xbox 360 (from now on Kinect 1) was released by Microsoft as a consumer gradedepth sensing technology, mainly aimed at interaction in a computer game environment. However, itslow cost, reliability and precision attracted the attention of scientific researchers from different fieldssince it offers an alternative to expensive laser based devices. In 2014 a new version of the Kinect tech-nology was released, the Kinect for Xbox One (from now on Kinect 2). A European research team wouldlike to use this device to track and identify users in a room. To accomplish this task, multiple sensorsneed to work together to produce a reliable position of an object in a 3D environment.However, since the Kinect technology is designed to function in a single sensor setup, it first has to beinvestigated how the use of multiple sensors affects the accuracy of the measurements, which is the maintask of this thesis.In order to be able to understand the working principles of the Kinect 2 and thus be able to design a suit-able measurement setup and understand any problems that arise, the first part of this thesis contains theresults of literature studies in the fields of three-dimensional imaging, error analysis and measurementsetups of similar experiments.This is followed by a description of the tools used in the measurements and the processes done prelimi-nary to the measurements, to ensure them to be as accurate as possible. This includes the development ofa suitable program to capture and process the data from the Kinect 2, camera calibration and determinethe point within the device from which the Kinect measures its distance as accurately as possible .A measurement setup using a single Kinect 2 was developed. Corresponding measurements were con-ducted to determine the precision and accuracy of the sensor without any interference. The errors arisingin the chosen setup and difficulties encountered in the chosen setup are discussed .The measurement setup used for the interference measurement is described, followed by the results ob-tained and a discussion of the findings.Finally, a conclusion is made regarding the usability of the Kinect 2 in a multiple sensor setup withoutany special adjustments of the sensor.

1

2Literature Study

In this chapter the results of my literature studies are summarized.The Kinect 2 is a new device on the market and there is not a lot of literature available. Starting theproject, I had little experience in the field of depth cameras, such as the Kinect 2. Therefore, the firststep for this thesis, was the study of related studies and literature on the topic [2, 5]. This includes paperson the topic of accuracy and interference measurements for similar cameras, which develop theoreticalmodels for error analysis and suitable experimental setups [6, 1, 7, 8, 9].

2.1 Three-Dimensional Imaging

The field of 3D imaging, describes devices which are able to generate three-dimensional informationfrom an observed scene. The main problem in this field is the generation of depth information for a givenpoint in the scene. Most of the developed technologies can be grouped into two main methods:

• Time-of-Flight: This method relies on an emitter sending out some kind of wave1 and then gaininginformation from the returning wave.

• Triangulation: Two sensors, or emitters with a known three-dimensional orientation to each other,can be used to determine the distance of a point relative to a given origin (e.g a camera). The twosensors have to focus on the same point. The knowledge of the position of the sensors relative toeach other and to the point in question can be used to determine the depth of that point.

Next some technologies will be discussed that apply the methods described above and are interesting inthe frame of this thesis in order to get more familiar with 3D imaging.

1This can be light at different wavelengths i.e. visible light, ultra-violet, infra-red, etc.

3

2 Literature Study

2.1.1 Stereo Vision

The most intuitive approach is stereo vision. Stereo vision is closely related to how humans perceivedepth. For this technology two optical cameras are needed to generate correspondences for specificpoints between the images of each camera. Then, using the triangulation method, the depth of thosepoints can be calculated. The main task in this approach is trying to identify corresponding points inthe two images as precisely as possible, since the accuracy of the whole technology relies on this ability.Additionally, the position of the two cameras relative to each other is highly relevant, since only forpoints which appear in both images the depth can be calculated.

2.1.2 Structured Light

Structured light cameras also apply a variant of triangulation to determine the depth of points in theimage, but using only one camera. The camera has a projector (see 2.1) that projects a certain patternonto the scene2. Within the sensor, the information is stored how this pattern would look if it wasprojected onto a planar surface at a certain (known) distance. This is called the reference pattern. Thecamera then captures an image of the scene, which includes the pattern projected onto it. Since thecamera and the projector are at an angle to each other, the pattern will be distorted compared to thereference pattern if the scene is at a different distance than the reference plane. This distortion can thenbe used to calculate the depth for each projected point. This technique has similar limitations to stereovision, because if the camera can not see a projected point no depth can be calculated. Additionally, theinformation density is limited by the number of projected points and due to this, the points for which thedepth is known get more spread out the further the observed scene is from the device.

Figure 2.1: The Kinect for Xbox 360 Sensor [1]

2.1.3 Laser Scanner

Another option for 3D imaging are exist time-of-flight based and triangulation based laser scanners. Thetime-of-flight based devices, have a laser which emits pulses of light and then measures the time it takes

2For the Kinect 1 the projector uses infra-red light so it does not appear in human vision.

4

2.1 Three-Dimensional Imaging

the light to hit an object, be reflected and then return to the sensor. Knowing the speed at which the lightmoves and the time it took to return to the device, it is possible to calculate the distance of the hit objectfrom the device. The triangulation based devices work similarly to the structured light approach. A lasershines on the object and then a camera is used to record the position of the laser point. Knowing thedistance between the camera and the laser emitter and the angles between the emitter and the camera, itis possible to determine the depth of the observed object. The time-of-flight based devices can be usedto measure distances up to the order of kilometers. Generally, they do not have great accuracy due tothe high speed of light the measurements are of small order and therefore really accurate time measuringdevices would be required. The triangulation based devices on the other hand, only have a range ofsome meters, but have better accuracy than time-of-flight devices. The main drawback of both laserscanner variants, is that they usually only measure one point at a time, so they can not produce depthmeasurements in real time of a whole scene.

2.1.4 Time-of-Flight Camera

These devices consist of an illumination unit and an image sensor. The illumination unit is used to sendout some kind of wave, which is then reflected by the scene and captured by the image sensor. Thereare two main types of time-of-flight cameras; one uses pulses of light and the other one uses continuouslight (see 2.2).

Pulse based Cameras

The pulse based cameras send out waves in pulses and then measure the time the wave takes to hit anobject and return to the camera. The time that the wave takes, is directly proportional to the distancetraveled as described by:

Z =t

2· c (2.1)

where Z is the traveled distance, t the elapsed time and c the speed of light. This approach suffers fromthe same problems as the time-of-flight based laser scanners, namely that the runtime that needs to bemeasured needs high precision clocks to achieve accurate results. Due to this limitation, the technologyis not a good fit for consumer-grade devices such as the Kinect 2.

Figure 2.2: Working-principle of time-of-flight cameras. Left: Continuous Wave Based. Right: Pulse Wave based.[2]

5

2 Literature Study

Continuous Wave Based Cameras

The second variant of time-of-flight cameras are based on continuous waves. The emitted wave is mod-ulated in amplitude to achieve a sinusoidal wave. The main difference to the pulse based method is thatthe runtime is not measured directly, but instead the phase shift of the reflected wave with reference tothe one originally sent out is calculated. This is achieved by making four measurements C(τ0), C(τ1),C(τ2) and C(τ3) at different points in the wave. It is then possible to fully determine the properties ofthe returning wave from those four measurements as:

Offset: Breflected =C(τ0) + C(τ1) + C(τ2) + C(τ3)

4Bemitted(2.2)

(2.3)

Amplitude: Areflected =

√(C(τ3) + C(τ1))2 + (C(τ0) + C(τ2))2

2(2.4)

(2.5)

Phase Shift: φ = arctanC(τ3)− C(τ1)

C(τ0) + C(τ2)(2.6)

using these values the reflected signal can be exactly described as

A · cos (ω · t+ φ) +B (2.7)

The knowledge of the characteristics of the emitted wave and the calculated phase shift, can then directlybe used to calculate the distance between the camera and a given point in the observed scene as follows:

Z =c

4πfmod· φ (2.8)

where Z is the traveled distance of the wave, c is the speed of light and fmod is the modulated frequencyof the signal. One weakness every time-of-flight device has to take into account is that the phase shiftalone is an ambiguous result. Two points in an image, where the difference in depth between the two isexactly one wavelength of the emitted light, will result in the exact same phase shift, but are clearly notat the same distance from the sensor. The same is true if the difference in depth between two points isan exact multiple of emitted wavelength. This ambiguity can be resolved by using multiple waves for asingle measurement where all have different wavelengths. Using the different phase shifts resulting fromeach different wave, one can determine the exact distance of a point. This is also the principle the Kinect2 uses to calculate its depth values.

2.2 Error Analysis

Merriam-Webster defines error as "the difference between an observed or calculated value and a truevalue; specifically : variation in measurements, calculations, or observations of a quantity due to mis-takes or to uncontrollable factors". For any experiment, the main goal then has to be to keep the erroras small as possible, since it strives to find or reproduce the true value. However, as stated in the givendefinition, the error in some data is not only made up by mistakes that can be identified and corrected insubsequent experiments, but also from uncontrollable factors. Since these factors and their contributionto a measurement can not be controlled by changing the experimental design, it is important to quantify

6

2.2 Error Analysis

them and determine their behaviour in order to eliminate them from our results. This is the task of erroranalysis, to estimate uncertainties in our results which are called errors. The ideas discussed in thissection are mainly based on the sources [10], [11] and [12].

2.2.1 Accuracy And Precision

When discussing error types, first some vocabulary that describes what kind of error is in question has tobe defined. The most common distinction made is between accuracy and precision. Accuracy, describeshow close the measurements are to the true value of a observed quantity. Precision, on the other hand,is independent of the true value of a quantity. It describes the reproducibility and repeatability of ameasurement, so it describes how likely it is to get similar results in subsequent measurements. In orderto achieve good results in an experiment, both good accuracy and precision have to be given. Highprecision, but low accuracy, describes an experiment which yields consistent results, but the data doesnot agree with the true value. High accuracy, but low precision in an experiment on the other hand, meansthat the data is scattered around the true value, but the single measurements are rather imprecise.In error analysis a distinction is made between two main error types systematic error and random error.

2.2.2 Systematic Error

Controlling the systematic error in a experiment yields increased accuracy in its results. Systematicerrors mainly result from the measuring devices or setup. They are hard to analyze statistically sinceevery measurement is affected by the same amount. The main way to reduce, or eliminate systematicerror, is a careful examination of the devices used, setup and the methodology. Systematic errors arisefrom wrongly used or calibrated equipment, a poor measurement environment3, or to bias on the part ofthe observer.

2.2.3 Random Error

Random errors result from random and unpredictable events in the experimental setup. Some examplesare electrical fluctuations that may affect the measuring devices, or in the context of this thesis differentamounts of ambient light that the sensor may capture. Due to the random nature of this error type, it canoften be modeled with a Gaussian normal distribution. Therefore, this error type can be analyzed usingstandard statistical approaches. Additionally this enables the application of statistical theorems, such asthe law of large numbers which states that:The average of the results obtained from a large number of trials should be close to the expected value,and will tend to become closer as more trials are performed.[13]This means that the more data is used, the closer the average will get to the true value.

3i.e. improper lighting, temperature.

7

3Measurement Setup

The tools that have been used for the measurements and the steps taken to achieve accurate resultswill be discussed in this chapter. The setup for the determination of the accuracy and precision ofan undisturbed sensor is discussed in detail. This was carried out using only one sensor. Finally, theinterference measurement setup is described.

3.1 Tools and Preliminary Steps

3.1.1 Kinect for Xbox One

As discussed previously (see section 2.1.4) the Kinect 2 is a time-of-flight camera, measuring the phaseshift of a modulated signal, to determine the depth of a point. As mentioned in section 2.1.4, these typesof sensors have an inherent property, that the calculated distance is ambiguous. According to [4], thesensor acquires images at frequencies of approximately 120 MHz, 80 MHz and 16 MHz to eliminateambiguity. The sensor has three main components; a color (RGB) camera, an infra-red light cameraand an infra-red light emitter (see Figure 3.1). The Kinect 2 gives access to six different data sourcescontinuously, namely three unprocessed data streams:

• Color image

• Infra-red image

• Audio source

and three data sources which are derived quantities:

• Depth map - In millimeters and measured from the sensor’s focal plane.

• Body index mask - Shows which pixels contain a tracked person and also which person it is.

9

3 Measurement Setup

• Body frame source - Collection of body objects, each one with several distinct joints with indi-vidual 3D orientation, and also supports a hand state functionality for up to 2 bodies. (table 3.1 forexact numbers)

infra-red camerainfra-red emitter RGB camera

Figure 3.1: The Kinect for Xbox One Sensor. ifixit.com

The specifications of the Kinect 2 are listed in table 3.1. The corresponding specifications of the firstversion of the Kinect are added, to compare the improvements between the two versions.The Kinect 2 has a fixed focus lens. This means, that the focus of the cameras used in the device is setwhen the lens is designed and can not be changed. Thus, it has to be set, so that objects in the largestrange of distances are still acceptably focused. This distance is the hyperfocal distance. It is defined byWikipedia as:The hyperfocal distance is the closest distance at which a lens can be focused while keeping objects atinfinity acceptably sharp. When the lens is focused at this distance, all objects at distances from half ofthe hyperfocal distance out to infinity will be acceptably sharp.[14]The hyperfocal distance depends on the focal length and the aperture1 of the camera. The aperture shouldbe narrow, so only highly collimated2 rays are admitted, resulting in a sharper focus on the image plane.Additionally, a short focal length also helps to achieve a small hyperfocal distance. A short hyperfocaldistance is desirable, since this directly relates to the closest distance that an object can have from acamera and still be in focus. The main drawback of this type of camera, is that it is not possible toproduce images that are as sharp as the ones from a camera with a changeable focus, which is set to thebest focal point for a given scene. Mainly due to the relatively narrow aperture required, the amount oflight that reaches the image sensor is rather small and thus the cameras are not suitable for (fast) moving

1An aperture is a hole of varying size located behind the lens in a camera. It determines how much and of which angle lightcan pass to the image plain.

2Collimated light is light whose rays are parallel, and therefore will spread minimally as it propagates. [15]

10


objects.

Table 3.1: Comparison of specifications between the Kinect 1 and Kinect 2[3],[4]

Kinect 1 Kinect 2

RGB camera (pixel) 640x480 or 1280x1024 1920x1080

Horizontal field of view (degrees) 57 70

Vertical field of view (degrees) 43 60

IR camera (pixel) 640x480 or 1280x1024 512x424

Max framerate 30 30 or 15 fps

Max depth distance (m) 4 4.5 (8)

Min depth distance (m) 0.5 0.5

Depth sensitivity (levels) 2048 (11-bit) 65536 (16-bit)

Max tracked skeletons 2 6

Joints per skeleton 20 26

3.1.2 Calibration

Camera calibration is the process of finding the internal quantities of a camera such as:

• Image center (principal point)

• Focal length - In the context of a camera, the focal length refers to the distance from the center ofthe lens to the image sensor.

• Distortion - Describes how a straight line in the real world might be bent in the image.

Using the notation introduced in [16], we denote a 2D point by m = [u, v]T and a 3D point by M =[X,Y, Z]T . The pinhole model relates a three dimensional point in the real world and its two dimensionalprojection on an image by the equation:

sm̃ = A[R t]M̃ with A =

α c u0

0 β v0

0 0 1

(3.1)

where x̃ denotes an augmented vector with 1 added as the last element, s is a scale factor, [R,t], therotation and translation relating the world coordinate system to the camera coordinate system. Togetherthey are called the extrinsic parameters. A is the camera intrinsic matrix from which the above discussedparameters can be derived. α and β are the scale factors of the u and v axis in the image, c describes theskewness of the two axis in the image and (u0, v0) are the coordinates of the principal point.In order to accurately construct a point cloud from the image produced by a sensor camera, calibrationhas to be performed. For our purposes, this is generally not necessary since we want to investigate theaccuracy of the data provided by the sensor, so we do not need to investigate the underlying model.

11

3 Measurement Setup

However, the depth data is calculated from the infra-red cameras focal plane3, so in order to accuratelycompare these values to a real measurement, the exact point from where the sensor does its measurementhas to be known. Standard camera calibration was performed using the toolkit described in [17]. Themethod applied in the toolkit is largely based on the approach described in [16].The calibration technique only relies on the camera observing a planar pattern in at least two differentorientations and can thus be performed without specialized equipment. The basic principles can bedescribed as follows:

1. Two basic constraints on the intrinsic parameters can be derived from equation 3.1.

2. Using the n captured images of the pattern, the two constraints result in 2n equations for which ingeneral a unique solution for the intrinsic and extrinsic parameters can be defined.

3. At this point, the radial distortion has not yet been taken into account. Using the results from 2.,the parameters for the radial distortion are approximated by the linear least square solution, againusing the fact that we have n images and m points on each image.

4. After correcting the parameters from step 2, with the radial distortion calculated in a final step,a refinement is carried out using the maximum likelihood estimation method which results in anonlinear minimization problem, whose solution are the final parameters.

For later use, the most important quantity gained from the calibration is the Kinects focal length. Twoseparate calibrations were performed with two sets of images. They resulted in focal lengths of 3.626±0.027 mm and 3.603 ± 0.013 mm, which are plausible results when comparing them to the valuespresented in [4].

3.1.3 Data Capturing Program

In order to gather the required data for the measurements a suitable program had to be developed, sincethere was no functionality that returned the data in the way it was required for the evaluation. A programwas written on top of one of the examples in the Kinect SDK [18], the sole function of which was todisplay the depth frame captured by the sensor. The program was written in C# using the MicrosoftVisual Studio development environment. The programing structure for the Kinect is contained in theMicrosoft.Kinect namespace. The most important class is the KinectSensor class. An instance of theclass represents a Kinect device and through it one can access the different data sources (see 3.1.1). Areader can then be opened for a specific data source, through which the individual frames can be accessed.The functionalities needed to add were:

• Data filter - There are pixels that do not contain any depth information, or which for some reasondisplay faulty values. These pixels should be excluded from the statistical evaluation of depthvalues, so the filter assures that only plausible data gets considered.

• Number of unusable pixels - These are the pixels which the filter does not allow into the calcula-tions, either because they do not contain information, or contain wrong information.

• Restrict area - Since the field of view of the Kinect 2 is rather large, the problem arises thatat some point the floor and other surrounding objects appear in the frame and might influencecalculations. Therefore, the area of the whole frame is restricted to only the desired area.

• Calculation of the mean distance - After applying the filter and the area restriction the meanvalue of the distance has to be calculated, using the data from each pixel in the selected area.

3The plane that passes through the focal point perpendicular to the axis of a lens

12


• Calculation of the standard deviation - The standard deviation of the depth reported by eachpixel has to be calculated.

• Extension to multiple frames - The program has to be able to calculate the mean and the standarddeviation from multiple consecutive frames.

• Standard deviation between frames - The standard deviation of the mean depth calculated forconsecutive frames has to be calculated.

Characteristics and Difficulties

One of the first problems that arose when working with the Kinect 2, was that some computers lost theconnection to the sensor after closing an initial access to it and were not able to reconnect to the sensorwithout a restart of the whole computer. This however, did not occur on all computers to which the sensorwas connected. With the support of Mr. Alavi, we finally found out that it was a problem where certainhardware components in the computer do not work with the Kinect 2. We found a workaround for thisproblem by running the Kinect 2 configuration verifier program before the first use and then leaving iton while using the sensor.One characteristic that had to be dealt with was that when a frame from the Kinect was acquired, thewhole source gets blocked as long as the frame is held. This means that one has to get the informationfrom a frame as fast as possible and then release it again. Additionally, the sensor does not return a frametwice if asked for the latest frame but the current latest frame has already been used. Instead it returnsa NULL reference if no new frame has arrived yet. In our implementation the program was set to sleepfor some time after each frame was processed, after which the sensor would have a new frame ready, butthis was a rather crude fix.For the calculation of the standard deviation between frames, the mean of each frame and the overallmean was used. This could instead be done per pixel, which would lead to some additional informationon where exactly in the observed frame random errors occur.

3.1.4 Depth Measuring Device

A distance measuring device was needed to generate a reference distance, which can be used as groundtruth for the determination of the accuracy and precision of the Kinect 2. The digital laser rangefinderDLE 40 Professional from Bosch was used for this task.Since the distance data from the laser will be used as ground truth, the accuracy and precision that canresult for the Kinect 2 is bounded by the accuracy of the laser rangefinder.The manufacturer reports the following technical data:

Table 3.2: Technical data of the Bosch DLE 40 rangefinder. From device instructions

Measuring Range 0.05-40 m

Measuring accuracy (optimal) ±0.05 mm/m

Measuring accuracy (typically) ±1.5 mm

Measuring accuracy (unfavourable) ±10 mm per 40 m

Lowest indication unit 1 mm

13

3 Measurement Setup

How good the measurements are depends on the conditions of the set-up. If the surface of the targetreflects the laser light well, the result will be more accurate than if the surface scatters the light or hasa low reflectiveness. The ambient light conditions can also influence the measurements quality, sinceintense ambient light leads to a lower accuracy. Therefore, no results with an accuracy lower than a fewmillimeters can be expected with this setup!

3.2 Single Sensor Setup

The research, to determine an appropriate setup for the single sensor setup, was started by looking atrelated works that were done using the Kinect 1.There were two main techniques used in determining the accuracy of the sensor. The first approach is thecomparison of the sensor to high-end laser scanning devices. For this purpose, the same scene is capturedby the laser scanner and the Kinect sensor. Then, a point cloud is generated by both devices. To generatethe point cloud for the Kinect sensor, it is important to perform an accurate calibration, as describedin section 3.1.2. The point cloud of the laser scanner is taken as ground truth and the deviation of theKinect point cloud is measured. Therefore, the quality of the result depends on the accurate descriptionof the Kinect sensors intrinsic properties. This experiment can be used to estimate the systematic errorand make adjustments to the calibration and experimental conditions.In the second approach, the sensor is pointed at a planar surface whose exact distance from the sensor isknown. The depth measurement is carried out at different distances and the results from the sensor arecompared to the real distance at each interval.In the measurements the second approach was used, since it is sufficient to determine the accuracy andprecision of the Kinect 2 and because there was no laser scanner available as reference.At each interval the exact distance from the front of the sensor to the planar surface (closet door see figure3.2), was measured using the laser discussed in section 3.1.4. The reference distance is determined onboth sides of the Kinect 2 and the sensor was adjusted so that the two distances match to ensure that norotational effects affect the measurements. The Kinect, as well as the distance measuring device, wereadjusted using a level to also ensure that no rotation in the vertical direction affects the measurement.Both the laser and the Kinect are mounted on tripods to ensure a consistent measurement (see figure 3.2).

The area of data used for the calculations was restricted to the area marked in figure 3.2, using thefeature implemented in the data capturing program (see 3.1.3), since when moving away from the targetother objects at a similar distance would appear on the frame of the camera and interfere with the mea-surement. With this technique only the intended surface gets measured.For each measured interval, 100 frames were taken for the calculation of the mean and the standard de-viation of the distance in the area used for the calculation. Using multiple frames, instead of only one,lets us decrease the random error in the measurement and gives us a better estimation of the sensorsaccuracy (see section 2.2.2). Additionally, the standard deviation of the calculated mean distance of theindividual frames is recorded, which gives us an estimate for the magnitude of the random error and letsus determine the precision of the Kinect 2 (see section 2.2.3).

3.2.1 Determining The Reference Distance

Since the reference distance is measured from the front of the case of the Kinect 2 to the wall, it isrequired to estimate the distance from the front of the case to the physical point within the camera whereits measurements are made in order to obtain accurate results. The depth information from the Kinect 2

14


Figure 3.2: Measurement target (left). Kinect and distance measuring device in single sensor setup (right).

is measured from the focal plane of the sensor. In a digital camera this corresponds to the image planewhich is where the physical sensor is located. The quantity of interest is the distance from the front ofthe Kinect 2 to the physical sensor and thus the focal plane. This can be decomposed into two parts, thedistance from the image sensor to the lens and the distance from the lens to the front of the case. The firstdistance corresponds to the focal length, which has already been determined in section 3.1.2. Therefore,what is left is to calculate the distance between the front of the Kinect and the lens. To estimate thisdistance the ratio between the size of an object in the real world (see figure 3.3) and the distance of thatobject to the lens was used. This ratio has to be the same as the ratio between the size of that object inthe picture and the distance of the image plane to the lens, which corresponds to the focal length. This istrue because these two quantities are parts of two similar triangles. So:

f

si=

d

so(3.2)

With f the focal length, si the size of the object on the image, d the distance between the object and thelens and so the size of the object in the real world.The size of the object on the image si is known since the pixel size of the infra-red camera is known from[4] and the number of pixels that the object occupies can be counted. Also known are the focal length fand the size of the object in the real world so. Thus the distance between the object and the lens d can becalculated.Then, the distance between the front of the case of the Kinect 2 and the object, which was measured us-

ing the laser rangefinder, is subtracted from the calculated distance d, which yields the distance betweenthe lens and the front of the case, which is the last unknown quantity.While doing the measurements and calculations described above, two main problems that greatly af-fected the accuracy of the result were encountered. The first problem was the resolution of the infraredpicture image. As discussed in section 3.1.1, the resolution is 512x424 pixels. This means, that objects in

15

3 Measurement Setup

Figure 3.3: Target used to determine reference distance.

the image only get a very limited number of pixels to represent them, which results in very fuzzy edges.Additionally, each pixel in the image corresponds - depending on the distance of the object to the camera- to a rather large area in the real world. Combining these two facts has the result that, the measurementpoints for the object in the real world can not be identified precisely in the image and because a pixelcorresponds to relatively large area, it makes a big difference for the results if one pixel more or less isincluded. For example, adding one pixel to the object size on the image of an object which is 18.5 cmin the real world, at a distance of 600 cm from the camera, results in a change of calculated distancebetween the front of the case and the lens of almost 6 mm.Secondly, small changes in the calibrated focal length, which is used for the calculations, have a bigimpact on the results as well. Each calibration with a slightly different image set, resulted in a differentfocal length (see section 3.1.2). These differences were in the order of tenths of a micrometer, with anuncertainty of some micrometers, but even such small differences had an influence of multiple millime-ters on the calculated distance between the case and the lens. The average resulting distance for differentfocal lengths were values between 5 mm and 10 mm.In order to estimate the distance between the case and the lens of the sensor independently of the focallength and the error which this introduces a second approach was followed. Equation 3.2 can be writtenas:

f

si=dm + c

so(3.3)

where dm is the distance measured form the front of the case to the target using the laser rangefinder andc is the sought distance between the case and the lens. This can be rewritten as

(dm + c) · si = f · so (3.4)

The right side of equation 3.4 is constant for all measurements of the same target object. Therefore takingtwo measurements with their respective si and dm we can write

(dm1 + c) · si1 = (dm2 + c) · si2 (3.5)

which can be solved for c resulting in the estimate of the sought value

c =dm2 · si2 − dm1 · si1

si1 − si2(3.6)

The results from this approach were worse than first ones. The reason for this may be that there is stillthe problem with the coarse pixels described above, which may have more influence in this calculation,

16


space

Figure 3.4: Reflection of the infra-red emitter. Infra-Red image from the Kinect 2 (left). Depth image from theKinect 2 (right).

since the values are used much more often. Additionally, in both approaches in order to counteract thebig influence of discrete pixels, objects as close to the camera as possible (see figure 3.3) were measured.This way the difference between single pixel in the image size had less of an influence. But since theobject now fills most of the image, effects such as radial distortion may have additional influence andincrease the error in the measured pixel length on the image.As it was not possible to determine the distance between the case and the focal plane accurately enough,it was decided to use the measurement data including the error resulting from the different measuringpoints. Therefore, the data will show a constant offset when measuring the error between data from theKinect 2 and the measured reference distance. Although it can be said that this offset is around onecentimeter and keeping this in mind it is still possible to get a good estimate of the precision of the newsensor.

3.2.2 Error Sources and Difficulties

One of the main difficulties was determining a suitable target for the measurement. The main problem,was that the walls and doors of the room the measurements were carried out in were not flat enough torepresent a good planar surface. The next possible target were the closets that were scattered around theroom and found a fitting closet with metal doors (see figure 3.2). After doing some test setups, it becameevident that there was always a relatively big spot in the middle frame with no depth information, for noapparent reason.

After looking at the infra-red frame and some testing, it became clear that the disturbance was at theexact location of the reflection of the infra-red emitter on the metal door and that it disappeared when thesurface did not reflect as much (see figure 3.4). Since this greatly influenced the measurement quality, themetal door was covered with paper. This solved the problem by creating a much less reflective surface.The two main error sources in the setup are; the distance measuring device and the stand used for theKinect. The floor of the room for the measurements is relatively uneven and since the stand has a flatsurface it rests on, the camera level had to be newly adjusted for each measurement interval. The secondmain error source is the device used to calculate the reference distance and assure the parallel orientationof the Kinect to the planar surface. As discussed in section 3.1.4, the average accuracy of the rangefinderis 1.5 mm. Both of these points were counteracted by keeping the standard deviation of the depth valuesbetween pixels in the measurement as small as possible and by making small adjustments after settingup a measurement using the level and the rangefinder, until the best position with the lowest standard

17

3 Measurement Setup

Figure 3.5: Interference measurement setup. Setup with sensors facing the same target, 0◦- 90◦ (left). Setup withsensors facing each other, 90◦- 180◦ (right).

deviation was found. The depth data calculated by the Kinect was also looked at before making thefinal measurements, to determine if a clear distinction could be made that one side of the measurementexhibited consistently larger or smaller distances than the other, in order to find rotational effects andcorrect them.

3.3 Interference Measurement Setup

To estimate the effect of multiple sensors active at the same time on the accuracy and precision of theKinect 2 a setup with two devices was used. First, a measurement with only one sensor active wasperformed. The resulting data was taken as ground truth, to which the results from measurements witha second active sensor were compared. The influence of the angle between the two sensors and thedistance of the two sensors to each other and to the observed target was studied. In order to get the mostaccurate measurements without the influence of changing conditions between different setups, the datawas collected from a single sensor, which was kept stationary throughout all setups, while the secondsensor was moved to generate the desired situations. An angle of 180◦ between the two sensors wascovered in the measurement. The whole angle is divided into two different parts. The first 90◦ describethe situation where the second sensor is located behind the measuring sensor and they both observe thesame scene. The second part from 90◦ to 180◦ describes the situation where the second sensor is in frontof the measuring sensor and now faces the first sensor (see figure 3.5). The second sensor was moved in10◦ intervals from 0◦ to 180◦.

The angle between the two sensors was determined using a meter ruler and a triangle ruler. For thetwo different setups the angle was measured at different places. For the setup where both sensors facethe same scene the triangle ruler was placed at the bottom of the closet so that it points exactly at themeasuring sensor and the first meter ruler was laid out along the axis between the sensor and the closet.

18

3.3 Interference Measurement Setup

Figure 3.6: Setup to determine the angle between the two senors. Setup with sensors facing the same target, 0◦-90◦ (left). Setup with sensors facing each other, 90◦- 180◦ (right).

Then, the second meter ruler is laid at an angle to the first one with one end on the triangle ruler. Thisway, it is possible to read off the angle between the two meter rulers on the triangle ruler. Then, thesecond sensor is placed along the second meter ruler resulting in the desired angle between the sensors(see figure 3.6 left). For the setup where the sensors face each other, the angle is measured at the bottomof the measuring sensors stand. The first meter ruler is left at the same position as for the first setup, butnow the triangle ruler is placed at the bottom of the stand, so that its long side is parallel to the first meterruler. The second meter ruler is then placed on the triangle ruler and similar to the first setup the anglecan then be read off and adjusted. Again, placing the second sensor along the second meter ruler givesus the desired angle between the two devices (see figure 3.6 right).To study the effect of different distance of the sensors to each other and to the observed scene, the factthat the angle between the sensors was determined using a meter ruler and the sensors was placed alongthem was used. Putting the start of both of the meter rulers at the base of the closet, or at the base ofthe measuring sensors stand, made it possible to use the distance marked on the ruler as reference tocalculate the distance between the two devices.

19

4Results and Discussion

4.1 Single Sensor Measurement

The main data used to estimate the accuracy of the Kinect 2, was the difference between the referencedistance and the average distance measured by Kinect 2, which can be seen in figure 4.1. It can be seenhow the mean error committed by the Kinect behaves at different distances. The dotted line is the bestfitting polynomial function which was calculated with the linear least square method and shows a rootmean square error of 0.0039 m. Keeping the considerations of section 3.2.1 in mind and thus subtractingabout one centimeter from all the data, it can be see that the error committed by the Kinect 2 is of theorder of millimeters to centimeters and shows linear behaviour with growing distance to the measuredobject.This result is supported by the theoretical error committed by a time-of-flight sensor using the phase shiftdetermination approach. Recalling the general formula for depth calculation of such sensors:

Z =c

4πfmod· φ (4.1)

where Z is the traveled distance of the wave, c is the speed of light, fmod is the modulated frequency ofthe signal and φ is the phase shift. We can see that there are two error sources in this calculation, namely,an inaccurate phase shift measurement, or an error in the modulation of the emitted signal. Therefore, anerror in the calculated depth can be written as:

dZ =dZ

dφ+

dZ

dfmod(4.2)

and applying this to equation 4.1 results in:

dZ =c

4πfmoddφ− cφ

4πf2mod

dfmodusing eq 4.1

=Z

φdφ− Z

fmoddfmod (4.3)

So equation 4.3 gives us our theoretical result, that the error in the depth measurement varies linear withthe distance between the sensor and the observed object.

21

4 Results and Discussion

When comparing these result to the accuracy of the Kinect 1 where the error grows quadratically andreaches values of multiple centimeters ([6],[4]), the behaviour measured for the Kinect 2 presents a bigimprovement.

Figure 4.1: Difference between the reference distance and the distance measured by the Kinect 2 as a function ofthe reference distance.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

10

20

30

40

50

reference distance [m]

erro

roft

heK

inec

t[m

m]

Measured errorlinear approximation

The data used to estimate the precision of the Kinect 2 was the measured standard deviation of the meanbetween the frames captured at each distance. Figure 4.2 shows how the standard deviation of the meanover all frames changes with growing distance to the measured object. The graph shows that the standarddeviation between frames starts at values significantly under one millimeter and then grows linearly witha rate of about 0.2 mm deviation per 0.5 m distance. These results indicate that the Kinect 2 is able tomake very consistent measurements, with little change between each one, even for larger distances to themeasured object. This is also reflected when looking at how the number of pixels with either no depthinformation, or faulty information behaves. At every distance interval, at least 3 different measurementsof 100 frames were made and in all cases the number of unusable pixels at a given distance stayedconstant throughout all the measurements.To control the quality of the measurements the standard deviation over all frames was observed. As canbe seen in figure 4.3, the average standard deviation fluctuates between 2 mm and 4 mm. The fluctuationscan mainly be attributed to small changes in the measurement setup, as described in section 3.2.2. Theorder of the standard deviation was found acceptable for all measurements to call the measurementaccurate, since no accuracy lower than a few millimeters can be reliably achieved with the given setup(see sections 3.1.4 and 3.2.2).

22

4.1 Single Sensor Measurement

Figure 4.2: Standard deviation of the mean depth between individual frames calculated over 100 frames as afunction of the reference distance.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.2

0.4

0.6

0.8

1

1.2

1.4


stan

dard

devi

atio

nbe

twee

nfr

ames

[mm

]

measured deviationlinear approximation

Figure 4.3: Average standard deviation calculated over 100 frames.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

1

2

3

4

5

6


stan

dard

devi

atio

nov

erfr

ames

[mm

] standard deviation

23


4.2 Interference Measurement

Figure 4.4 and figure 4.5 show how the measured distance between the Kinect and the planar surfacechanges with a second active sensor at different angles and distances. They present the change in av-erage distance compared to the reference measurement, where only one sensor was active. Figure 4.4presents the results of the first interference measurement. It shows that different distances between thetwo sensors, or between the sensors and the observed scene, do not seem to impact the result, since allmeasurement distances show the same behaviour. The first measurement shows a different behaviourin the two parts of the measurement. In the first part, where the two sensors are facing the same scene(0◦ - 90◦), the measured difference is positive. On the other hand, in the the second part, where the twosensors are facing each other (90◦ - 180◦), the difference is negative. However, it is important to notethat when turning off the second sensor at the end of the measurement, the measured depth stayed atthe level measured at 180◦ and did not go back to the starting value. This led me to believe that theobserved effect may not be due to the interference, or the different positions of the sensors, but just theeffect of changing conditions during the measurement. Since the first measurement took multiple hours,the sensor may have warmed up considerably which may have influenced the process. To confirm thisidea a second measurement (see figure 4.5) was carried out. In this measurement no different distancesbetween the sensors were taken into account, since the first one already showed that no change was ob-servable due to different distances but it was only focused on the changing angles. Additionally, in thefirst measurement, the data was collected starting from angle 0◦ and moving to 180◦ at the end. There-fore, the second measurement was started at angle 180◦ and the 0◦ measurement was the last one, tomake sure that if something similar to the effect in the first measurement would be observed, it could beexcluded that it has something to do with the measurement procedure. The figure shows no behaviourlike the first experiment, but data stays consistent throughout the different angles and therefore supportsthe assumption that the effect observed in the first measurement was not due to interference.Regarding the general influence of a second sensor on the measured distance, a distinction has to bemade between two situations that occurred. Most of the time the results behaved like in the second mea-surement and in the bigger part of the first one. Where no interference can be observed, since the datafluctuates within less than one millimeter, which corresponds to the random error that the undisturbedmeasurement shows. This not only applies to the measured distance, but also to the standard deviationas well as the standard deviation between individual frames, as can be seen in figure 4.6.

The second condition is an irregular occurrence which can be seen in the data of figures 4.4 and 4.6, atdifferent distances when measured at the relative angles 20◦ and 30◦. The difference in measured distancein figure 4.4, shows a significant peak, but even more noticeable is the peak in figure 4.6 which representsthe difference in the standard deviation between frames, when comparing the interference measurementto a single sensor measurement. As can be seen, the values jump from under one millimeter to almost 16mm. This means that there suddenly is a big inconsistency for the calculated distance between frames.Figure 4.7 shows how the depth frames look during such phases. In section 3.2.2 it was discussed howa reflective surface shows an area with no depth information around the infra-red emitter of the sensor,resulting from the bright reflection of the same in the infra-red image. When looking at the four imagesin figure 4.7, they all show a black circle with constant size in the center (or to the right of the secondblack spot) of the images. This is exactly the effect discussed in section 3.2.2. This part of the imagealways stays the same. What is different is that now there is a second black spot to the left of the onecorresponding to the measuring sensor. This spot is due to the second active sensor. When the twosensors behave as mentioned in the first part of this discussion, there appears almost no remnant of thesecond sensor on the image of the capturing sensor, or at most, as much as visible in the top left image.This means, that the second sensor and the emitted infra-red light does not get captured by the other

24


Figure 4.4: First measurement - Difference between distance measured with and without a second sensor active.Measured at different angles between the two sensors

0 20 40 60 80 100 120 140 160 180−4

−2

0

2

4

6

8

angle

devi

atio

nfr

omsi

ngle

sens

or[m

m] same distance

50cm apart100cm apart150cm apart

sensor. This effect is always true for both of the sensors, so when one does not see the light of the other,neither does the other. However, sometimes they suddenly start to detect the other sensors light, whichresults in the big additional black spots in figure 4.7.

The second sensor does not produce a consistent influence on the measuring sensor, but shows a changingalmost pulsating behaviour. This is the reason why the black spots in figure 4.7 show different patternsand sizes. This is most likely also the cause of the big changes in the measurement between framesobserved in figure 4.6. In the first interference measurement shown in figures 4.4 and 4.6, it can be seenthat the period with a strong interference suddenly went away again. Additionally, it has to be noted thatin the second measurement, no strong interference occurred at any angle, as can be seen in figure 4.5.Figure 4.8 shows the standard deviation between frames, from an experiment where it has specificallybeen tried to create a strong interference. We were unable to consistently force a strong interference, butin this case it occurred while the two sensors were facing the same scene at a rather close angle. A buildup of the interference was observable in the first blue section of the measurement, where the effect getsconsistently stronger. These measurements were taken in the time frame of about 30 seconds. Movingthe second sensor around while still facing the same scene as the measuring sensor had no influence onthe behaviour. When turning the second sensor away from the measuring sensor, the effect completelyvanished as can be seen in the green section. The same was the case when the second sensor was pointedat the measuring sensor, but with a big angle between them so that the second sensor was not in the fieldof view of the measuring sensor. This is indicated by the yellow section. When turning the second sensorback to facing the same scene as the measuring sensor, the effect recurred, as can be seen in the last bluesection.

25


Figure 4.5: Second measurement - Difference between distance measured with and without a second sensor active.Measured at different angles between the two sensors

0 20 40 60 80 100 120 140 160 180−4

−2

0

2

4

6

8

angle

devi

atio

nfr

omsi

ngle

sens

or[m

m] facing sensor - 1m distance

facing target - 1.5m distance

Figure 4.6: First measurement - Difference in standard deviation between individual frames with and without asecond sensor active. Measured at different angles between the two sensors

0 20 40 60 80 100 120 140 160 180

0

5

10

15

20

angle

devi

atio

nfr

omsi

ngle

sens

or[m

m] same distance

50cm apart100cm apart150cm apart

26


Figure 4.7: Interference causing changing areas with no depth information around second sensor.

; ;

; ;

Figure 4.8: Interference occurrence. Standard deviation over individual frames for different positions of the twosensors.

facing

thesam

e targe

t

turne

d away

facing

each

other

back

facing

same tar

get0

2

4

6

8

Stan

dard

devi

atio

nov

erfr

ames

[mm

]

27


4.3 Discussion

The data in section 4.1, shows that the Kinect 2 offers a big improvement compared to its previousversion the Kinect 1 in many areas. The newer version shows linear error behaviour with increasingdistance compared to the quadratic behaviour of the Kinect 1. Also, concerning the precision of themeasurement, it offers a great improvement, as the Kinect 1 showed rather large fluctuations in its depthmeasurements even for static scenes, while the results presented in this thesis shows that the Kinect 2yields very consistent data which only fluctuates in the order of less than one millimeter.Regarding the interference measurement, the Kinect 2 also shows very promising results. Most of thetime, the sensor shows no interference when in the presence of a second active sensor, for a range ofdifferent configurations of the two sensors with respect to each other. This is a vast improvement tothe Kinect 1 which on its own showed a lot of interference, which had to be dealt with using elaborateapproaches [8]. The interference that occurred irregularly does pose a problem, since it does influencethe single frames considerably but seems to fluctuate around the correct depth value, resulting in an errorin the calculated average depth which only has the order of some millimeters.From the information that was gather regarding this effect, the most plausible theory regarding its causeis some synchronization of the frequencies used by the two sensors. As discussed in section 3.1.1,the Kinect 2 uses different frequencies to eliminate the ambiguity inherent to phase shift based depthmeasurement devices. Therefore, if two sensors are not using the same frequency to capture their images,they may not detect the light emitted by the other sensor. On the other hand, if the sensors are by chanceusing the same, or a similar frequency, this could be the cause of the observed interference.

28

5Conclusion and Future Work

5.1 Conclusion

In this thesis the basic principles of three-dimensional imaging were discussed and several technologiesand devices utilizing them were presented.The general concepts of error analysis were presented and how to integrate them to achieve good mea-surements was discussed.The working principles of the Kinect for Xbox One were presented, as well as its characteristics. Cali-bration was performed and the results were verified by comparing them to the values presented in otherworks.A program was developed to gather the data from the Kinect 2. The program includes the ability to sortthe data to exclude unreasonable values and to restrict the area of the whole frame for which calculationsare performed. The mean and standard deviation of the depth data from the Kinect 2 can be calculatedand the process can be adjusted to calculate the mean depth over multiple frames.A single sensor measurement setup was developed, suitable to gain an estimate of the accuracy and theprecision of the Kinect 2. Measurements were conducted which confirmed the predicted linear relationbetween the error in depth values and the distance from the object. The measurements showed an accu-racy of the order of centimeters, which is an upper bound estimate since there is a constant offset of aboutone centimeter in the calculated data, due to the measurement setup, which we were unable to determineexactly enough. The data showed excellent results for the precision of the Kinect 2, with very consistentmeasurements, with fluctuations of less than one millimeter and a linear growth with increasing distancefrom the measured object.An interference measurement setup was developed using two sensors. The interference was measuredfor varying angles and distances between the sensors, as well as the two sensors facing the same scene,or facing each other. The results generally showed promising behaviour with no observable interferencebetween the sensors. An irregular interference effect was observed, with clear influence on the measureddata, but could not be reproduced consistently. The effect and the source of this effect was studied andencountered multiple times and a first hypothesis regarding the cause was made.

29

5 Conclusion and Future Work

5.2 Future Work

The main focus for future work with the topic of interference measurement for the Kinect 2 should beto gain a better understanding of the cause and the source of the irregular interference effects describedin this thesis. This knowledge can be used to describe setups between sensors which do not produce aninterference and are therefore suitable for multi sensor use, or to develop a strategy to avoid or counteractthe effects completely.

30

Bibliography

[1] Jan Smisek, Michal Jancosek, and Tomas Pajdla. 3d with kinect. In Consumer Depth Cameras forComputer Vision, pages 3–25. Springer, 2013.

[2] Victor Antonio Castaneda Zeman et al. Constructive interference for Multi-view Time-of-Flightacquisition. PhD thesis, Technische Universität München, 2012.

[3] Wikipedia. Kinect. https://en.wikipedia.org/wiki/Kinect#Kinect_for_Windows. Accessed: 21. January 2016.

[4] Diana Pagliari and Livio Pinto. Calibration of kinect for xbox one and comparison between the twogenerations of microsoft sensors. Sensors, 15(11):27569–27589, 2015.

[5] Timo Kahlmann, Fabio Remondino, and H Ingensand. Calibration for increased accuracy ofthe range imaging camera swissrangertm. Image Engineering and Vision Metrology (IEVM),36(3):136–141, 2006.

[6] Kourosh Khoshelham. Accuracy analysis of kinect depth data. In ISPRS workshop laser scanning,volume 38, page W12, 2011.

[7] Nima Rafibakhsh, Jie Gong, Mohsin K Siddiqui, Chris Gordon, and H Felix Lee. Analysis of xboxkinect sensor data for use on construction sites: depth accuracy and sensor interference assessment.In Constitution research congress, pages 848–857, 2012.

[8] D Alex Butler, Shahram Izadi, Otmar Hilliges, David Molyneaux, Steve Hodges, and David Kim.Shake’n’sense: reducing interference for overlapping structured light depth cameras. In Proceed-ings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1933–1936. ACM,2012.

[9] Andrew Maimone and Henry Fuchs. Reducing interference between multiple structured light depthsensors using motion. In Virtual Reality Short Papers and Posters (VRW), 2012 IEEE, pages 51–54.IEEE, 2012.

https://en.wikipedia.org/wiki/Kinect#Kinect_for_Windows

https://en.wikipedia.org/wiki/Kinect#Kinect_for_Windows

Bibliography

[10] Philip R Bevington and D Keith Robinson. Data reduction and error analysis for the physicalsciences. McGraw–Hill, 2003.

[11] R. H. B. Exell. Error analysis. http://www.jgsee.kmutt.ac.th/exell/PracMath/ErrorAn.htm. Accessed: 21. January 2016.

[12] Accuracy and precision. https://en.wikipedia.org/wiki/Accuracy_and_precision. Accessed: 21. January 2016.

[13] Law of large numbers. https://en.wikipedia.org/wiki/Law_of_large_numbers.Accessed: 21. January 2016.

[14] Wikipedia. Hyperfocal distance. https://en.wikipedia.org/wiki/Hyperfocal_distance. Accessed: 21. January 2016.

[15] Wikipedia. Collimated light. https://en.wikipedia.org/wiki/Collimated_light.Accessed: 21. January 2016.

[16] Zhengyou Zhang. Flexible camera calibration by viewing a plane from unknown orientations. InComputer Vision, 1999. The Proceedings of the Seventh IEEE International Conference, volume 1,pages 666–673. IEEE, 1999.

[17] J.Y. Bouguet. Camera calibration toolbox. http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html. Accessed: 21. January 2016.

[18] Microsoft. Kinect for windows sdk 2.0. https://dev.windows.com/en-us/kinect/develop. Accessed: 21. January 2016.

32

http://www.jgsee.kmutt.ac.th/exell/PracMath/ErrorAn.htm

http://www.jgsee.kmutt.ac.th/exell/PracMath/ErrorAn.htm

https://en.wikipedia.org/wiki/Accuracy_and_precision

https://en.wikipedia.org/wiki/Accuracy_and_precision

https://en.wikipedia.org/wiki/Law_of_large_numbers

https://en.wikipedia.org/wiki/Hyperfocal_distance

https://en.wikipedia.org/wiki/Hyperfocal_distance

https://en.wikipedia.org/wiki/Collimated_light

http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html

http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/parameters.html

https://dev.windows.com/en-us/kinect/develop

https://dev.windows.com/en-us/kinect/develop

kinect sensor interference measurement · 2.1 three-dimensional imaging the light to hit an object,...

Documents