person tracking and following with 2d laser scannersjpineau/files/leigh-icra15.pdf · this approach...

Person Tracking and Following with 2D Laser Scanners

Angus Leigh1, Joelle Pineau1, Nicolas Olmedo2, and Hong Zhang2

Abstract— Having accurate knowledge of the positions ofpeople around a robot provides rich, objective and quantitativedata that can be highly useful for a wide range of tasks, includ-ing autonomous person following. The primary objective of thisresearch is to promote the development of robust, repeatableand transferable software for robots that can automaticallydetect, track and follow people in their environment. Thework is strongly motivated by the need for such functionalityonboard an intelligent power wheelchair robot designed toassist people with mobility impairments. In this paper wepropose a new algorithm for robust detection, tracking andfollowing from laser data. We show that the approach iseffective in various environments, both indoor and outdoor, andon different robot platforms (the intelligent power wheelchairand a Clearpath Husky). The method has been implementedin the Robot Operating System (ROS) framework and will bepublicly released as a ROS package. We also describe and willrelease several datasets designed to promote the standardizedevaluation of similar algorithms.

I. INTRODUCTION

Having accurate knowledge of the positions of people overtime provides rich, objective and quantitative data that can behighly useful for a wide range of applications. More specifi-cally, the capability to autonomously detect, track and followa person has been identified as an important functionality formany assistive and service robot systems [1], [2]. Over thelast decade, significant progress has been made in developingperson detection and tracking algorithms, often with the aimof improving human-robot interaction or robot navigation inpopulated environments [3], [4].

Yet most of the work to date is not easily transferable tonew applications: algorithms are tested on a single robot ina single environment (if at all, sometimes only in simulationunder artificial conditions), in many cases the code has notbeen made publicly available, datasets collected during vali-dation sessions are not shared, and quantitative comparisonsto existing algorithms are not performed. Despite our bestefforts, we have not been able to replicate many of thepublished results onboard the SmartWheeler platform [5].In fact, we have yet to find a robust laser-based person-following system for a mobile robot that has demonstratedcapability to work indoors, outdoors and in cluttered andcrowded areas.

In this paper we present a novel method for person track-ing and following with 2D laser scanners. It builds on recent

1 Angus Leigh and Joelle Pineau are with the School of Com-puter Science at McGill University, QC, Canada {angus.leigh,jpineau}@cs.mcgill.ca

2Nicolas Olmedo and Hong Zhang are with Mechanical Engineering andComputer Science departments at the University of Alberta {olmedo,hzhang}@ualberta.ca

results in the literature [3], [4], extending them in severaldirections to improve accuracy and reduce the number of er-rors. One contribution of this work is a novel tracking methodwhich uses tracking of both legs, rather than an individualleg such as in [3], [4], to improve reliability, especially incases of self-occlusion. We also integrate local occupancygrid maps to improve data association by disallowing theinitiation or continuation of people tracks in occupied space.However, unlike the approach in [6] which constructs occu-pancy grid maps of all scan points, ours only uses non-humanscan points and therefore does not require people to movecontinuously to be tracked. Our approach also integratesthe method of cluster tracking present in Robot OperatingSystem (ROS) leg detector package in which all detectedclusters are tracked in every frame, included non-humanclusters, improving data associated in cluttered environments.Finally, we incorporate a closed-loop control algorithm toallow the robot to autonomously follow tracked individuals.

A major asset of our approach is the ability to work onmultiple different robots, under various operating conditions.We present extensive empirical results, validating the perfor-mance of our approach on two different robot platforms over40 minutes of data collected across different environments,both indoor and outdoor, with moving and stationary robot,and varied crowd and obstacle conditions. These resultsshow the benefit of our tracking approach compared to theexisting open-source implementation in ROS, especially inthe application of person following. Another contribution ofthis work is a public release of all datasets and code usedduring our investigations as an open-source ROS package,to facilitate ongoing research in this area in future years.These datasets include, to the best of our knowledge, thefirst laser-based person tracking benchmarks collected on amoving robot.

While the work we report here focuses specifically onthe problem of accurate tracking and following of peopleby social and assistive robots, we expect our work to havesignificant applications outside of this field, including forsecurity (e.g. tracking intruders), entertainment (e.g. devel-oping interacting exhibits), marketing (e.g. location-awarepersonalized advertisement), rehabilitation (e.g. assessmentof patients’ locomotion patterns following an injury [7]), andbeyond.

II. PROBLEM DESCRIPTION

The problem of interest can be decomposed into threesub-problems: (1) detection, (2) tracking, and (3) following.For reasons of modularity and robustness, these problemsare tackled with three separate (though connected) modules.

Such modularity is consistent with most of the literatureon the topic, and allows development of a solution that istransferable to a large range of applications. For example,a security robot may require only detection and tracking; asocial robot on the other hand may ask the user to manuallyspecify what person to follow (e.g. from an image, or witha gesture command), and use autonomous behavior only fortracking and following.

For the purposes of this work, we limit our attention tosystems with planar laser sensors. Such sensors are widelyavailable on autonomous robots, especially those deployed inpublic and urban environments, due to their reliability andaccuracy for mapping and navigation tasks. Laser sensingis also computationally cheap to process, lighting invariant,and functional under diverse operating conditions. Further,the wide field of view of modern laser scanners allows therobot to follow in close proximity with less risk of losingpeople out of frame.

We aim to build and test a system that can achieve persontracking and following under diverse conditions: differentrobot platforms, indoors and outdoors, single or multiplepeople, cluttered with stationary or moving obstacles, with-out an a priori map of the environment. In support of this,we deploy and validate our system on two different robotplatforms, each in a different environment.

The SmartWheeler robot, shown in Figure 1, is an intel-ligent powered wheelchair, designed in collaboration withengineers and rehabilitation clinicians, aimed at assistingindividuals with mobility impairments. The robot is builtupon a commercial power wheelchair base, to which wehave added onboard computing, three Hokoyu UHG-08LXlaser scanners, several sonars, an RGB-D camera, on-wheelodometry, and a touchscreen for two-way interaction. One ofthe important tasks for the robot is to automate navigation inchallenging environments (crowded rooms, narrow spaces)to reduce physical and cognitive load on the wheelchairuser [2]. The task of moving around with another individual,whether a friend or caregiver, is also one that requires sub-stantial concentration, and wheelchair users often commentthat it is difficult for them to simultaneously control theirwheelchair (presumably via joystick) and hold a conversa-tion, thus having the ability to use autonomous navigationcapabilities for walking side-by-side with (or behind) anotherperson is highly desirable [2].

A Husky A200 from Clearpath Robotics (shown in Fig. 2)is used for the outdoor experiments in this work. This robothas been used for mapping, localization, route followingand autonomous navigation on rugged terrain, such as sand,gravel and grass. An on-board computer is used to interfacelow-level controllers and sensors, as well as to process visualand range measurements. Typically, an additional computeris mounted on top of the platform for applications with highcomputational requirements. A Hokuyo URG-04LX-UG01laser sensor, mounted on the front of the robot, is used tocollect the 2D scans used in this study.

Fig. 1. The SmartWheeler robot.

III. RELATED WORK

Most previous work focuses on only one or two ofthe three identified sub-problems: detection, tracking andfollowing. Few papers present integrated systems tacklingall three components. We focus in particular on work thatuses depth sensors, such as laser or RGB-D, as those are thesensors available on our SmartWheeler robot, as well as onnumerous other assistive robots.

Montemerlo et al. [8] were possibly the first to present amethod for automatically locating and tracking people fromlaser data. One drawback of this work is the necessity for ana priori occupancy grid map of the operational environment,which is used for background subtraction to detect theperson. The tracking is achieved via a conditional particlefilter.

Schulz et al. [6] proposed to estimate the number ofpeople in the current scan based on the number of movinglocal minima in the scan. Unfortunately, this requires peoplemove continuously to be tracked, and is susceptible to poorresults in cluttered environments (where the number of localminima is misleading). They also introduced a Sample BasedJoint Probabilistic Data Association Filter (SJPDAF) over theobserved local minima to improve tracking reliability.

Topp et al. [9] extended [6], by picking out shapes oflegs and person-wide blobs in laser scans using hand-codedheuristics, to allow detection and tracking of both stationaryand moving people. The approach was also combined with aperson following navigation algorithm, combining both thetracked person’s position as well as the location of nearbyobstacles to determine suitable control. Gockley et al. [10]used a similar approach, with a few modifications, includingusing a Brownian motion model for the tracking component.This approach was further extended by Hemachandra [11],which improved the person-following component by propos-ing a navigation approach that accounts for personal space,while avoiding obstacles. Unfortunately these approachescite tracking difficulties in cluttered conditions, since theyrelied primarily on detecting clusters of a heuristically-determined size in the laser scan.

More recently, Arras et al. [3] reduced this limitation byproposing a method that detects legs by first clustering scanpoints and then using supervised learning to learn shapesof leg clusters. Detected legs are tracked over time using

constant-velocity Kalman filters and a multiple-hypothesistracking (MHT) data association technique. This approachbenefits from its ability to maintain (but not initiate) tracksof stationary people, and does not require an a priori occu-pancy grid map of the environment. Initial results for thismethod appear promising, but demonstrations on walking-speed robots in cluttered and crowded areas have yet to beperformed. Thus, many questions remain about the robust-ness and generalizability of the approach.

Finally, Lu et al. [4] extended an existing ROS packageoriginally developed at Willow Garage, that had not been for-mally published.1 This method first matches scan clusters us-ing nearest-neighbour (NN) data association, then determineswhich clusters are human legs using a supervised learningapproach similar to [3]. In the data association literature, it isgenerally agreed that the NN filter is outperformed by moresophisticated methods, such as the global nearest-neighbour(GNN) filter, SJPDAF or MHT [12]. Furthermore, the dataassociation method only considers absolute Euclidian dis-tances while ignoring valuable uncertainty covariances thatcan improve tracking.

Navarro-Serment et al. [13] present the only publishedwork, which we are aware of, that aims to track people froma mobile robot in hilly, outdoor environments using only2D laser scanners. The method’s applicability to cluttered orcrowded environments is questionable since it uses a NN dataassociation method and clusters scan points with a relativelylarge threshold of 80cm.

Person following with other sensors: Munaro et al. [14]proposed to track people in real-time with an RGB-D sensoron a mobile robot, demonstrating promising results with highupdate rates running on a CPU only. Similarly, Gritti etal. [15] demonstrated a method for detecting and trackingpeople’s legs on a ground plane from a low-lying viewpointwith the an RGB-D sensor. However the RGB-D sensor’snarrow field of view, minimum distance requirement, andinability to cope with sunlight, limit this technique’s appli-cations for person-following.

Cosgun et al. [1] presented a novel person followingnavigation technique which is most similar to a DynamicWindow Approach [16]. The initial detection was acquiredvia a user indicating the desired person they wish to followin the an RGB-D sensor’s image. Tracking was then achievedvia lasers using estimated leg positions. This approach canonly track one person, and uses a NN matching over asmall number of laser segment features, thus is not robust insituations with multiple individuals.

Kobilarov et al. [17] developed a person-following Segwayrobot using an omni-directional camera and a laser scanner.Nonetheless, variable environment lighting, backgrounds andappearances of people are factors which can be difficult tocontrol for and can mislead vision-reliant tracking systems.

1http://www.ros.org/news/2009/12/person-following-and-detection-in-an-indoor-environment.html

IV. JOINT LEG TRACKER

Our proposed system, which we call the Joint Leg Tracker,includes several components. The autonomous person detec-tion is achieved using clustering over laser detections; con-fidence levels are also assessed to help prune false positives.The autonomous tracking is achieved using a combinationof Kalman filter (for predictive over consecutive scans) anda GNN method (to resolve the scan-to-scan data associationproblem). Finally, a control algorithm is used to follow thetracked target.

A. Autonomous person detectionThe laser scanner returns a vector of distance measure-

ments taken on a plane, roughly 30cm from the ground, ata resolution of 1/3◦. Scan points returned are first clusteredaccording to a fixed distance threshold, such that any pointswithin the threshold are grouped together as a cluster. Thethreshold is chosen to be small enough to often separate aperson’s two legs into two distinct clusters, but to rarely gen-erate more than two clusters per person. To mitigate noise,clusters containing less than three scan points are discardedin low-noise environments and clusters containing less thanfive scan points are discarded in high-noise environments.To compensate for egomotion, the position of all detectedclusters are transformed to the robot’s odometry frame.

Detected clusters are used as observations for the tracker.The observations at timestep k are denoted as zk =�z1k, z

2k, ..., z

Mkk

�where Mk is the total number of detected

clusters at time k.Clusters are further classified as human or non-human,

based on a set of geometric features of the clusters. Thefeatures are listed in Table I and extend the set proposedin [3] and [4]. The classification is done using a randomforest classifier trained on a set of 1700 positive and 4500negative examples [18]. Positive examples were obtained bysetting up the laser scanner in an open area with significantpedestrian traffic; all clusters which lay in the open areasand met the threshold in Sec. IV-A were assumed to be theresult of people and were used as positive training samples.Negative examples were obtained by moving the sensoraround in an environment devoid of people; all clusters whichmet the threshold in Sec. IV-A were used as negative trainingsamples.

One benefit of using an ensemble classification methodis that a measure of confidence in the classification can beextracted by considering the number of individual predictorspredicting each class.2 Thus, rather than using the binaryoutput of the classifier (human/non-human), we considerthe confidence level (from 0-100%) from the classifier andpass this information (along with the cluster location) to thetracking module.

B. Autonomous tracking of multiple people1) Kalman filter tracking: The position of all detected

clusters (regardless of confidence level) are individually

2A function for extracting this confidence is included in the random forestclassifier implementation for OpenCV [19].

TABLE IFEATURES USED FOR CONFIDENCE CALCULATION OF SCAN CLUSTERS.

Number of points Width LengthStandard deviation Avg dist. from median Occluded (boolean)Linearity Circularity Radius of best-fit circleBoundary length Boundary regularity Mean curvatureMean angular diff. Inscribed angular var. Dist. from laser scanner

tracked over time using a Kalman filter [20]. We refer toeach Kalman filter which tracks a scan cluster over time asa track denoted as x

jk and the set of all active tracks as

Xk =�x1k,x

2k, ...,x

Nkk

�where Nk is the total number of

tracks at time k. Further, the set of tracks Xk also includesa single track for each tracked person (their initiation isintroduced in Sec. IV-B.4).

The Kalman filter for each track has a state estimatexjk =

�x y x y

�T with the position and velocity of thecluster in 2D coordinates. New tracks are initialized witha velocity of zero and existing tracks are updated with aconstant velocity motion model. Process noise w is assumedto be Gaussian white noise with diagonal covariance Q = qI.

The observation matrix H includes only the position of thecluster and observation noise has a covariance of R = rI. Intuning the Kalman filter, it was found that a small observationnoise covariance (r = 0.12) is sufficient because positionmeasurements from the laser scanner are highly accurate.

2) Global nearest neighbour data association: Since thesystem is designed to track all detected scan clusters, un-certainty arises pertaining to how detections zk should bematched to tracks from the previous time Xk−1 to produceupdated tracks Xk for the current time. To address this,we use a GNN data association method which is solvablein polynomial time (O(max(Nk,Mk)3)), via the Munkresassignment algorithm [21].

First, all tracks Xk−1 are propagated to produce the stateestimates Xk|k−1 for time k. Next, the Munkres cost matrixis populated with the cost of assignment between everypropagated track and every detection. The cost metric usedin our tracking system is the Mahalanobis distance3 betweenthe detection zik and the propagated track x

jk|k−1. The

covariance used to calculate this distance is the innovationcovariance, which, in the case of the linear Kalman filter,is S

jk = HP

jk|k−1H

T + R where Pjk|k−1 is the prediction

covariance of the propagated track xjk|k−1. Here we use a

higher noise covariance (r = 0.52) to allow more flexibilityin associations.

A threshold is imposed such that detections outside the pthpercentile confidence bounds of the expected observation aregiven an arbitrarily high assignment cost, costmax. Since theMunkres algorithm requires a square matrix and Nk is oftendifferent from Mk, the cost matrix is padded with costmax

to produce the necessary shape. When an assignment ismade between a detection and a track by the Munkres

3Note that other GNN data association approaches have recommendedminimizing the square of this distance [22]. We found best results minimiz-ing the non-squared distance in our applications.

algorithm and the cost is costmax, the assignment is ignored.Otherwise, the track is updated with the position and theconfidence-level of the detection. The position is used toperform an observation update in the track’s Kalman filterto produce the updated track positions xj

k for the current timek. Tracks without matched detections are propagated forwardwithout observations. Detections without matched tracksspawn new tracks at their current position. The confidenceof each track, denoted c

jk, is computed by the exponentially-

weighted moving average cjk = 0.95cjk−1 + 0.05dc(zik),

where dc(zik) is the confidence of the ith detection whichwas assigned to the jth track.

Further, since people are represented as a single trackbut people’s legs can produce between 0-2 observations(depending on the number of legs visible to the laser scannerand whether or not their legs have been clustered together),an identical temporary track is created for each personbefore the data association. If both of the person’s tracksare matched to detected scan clusters, then the trackedperson’s position is updated with the mean of the detections’positions. If only one track is matched to a scan cluster,then the tracked person’s position is updated with the meanof the detection’s position and the propagated position ofthe person’s track. After the person’s original track has beenupdated, the temporary track is deleted.

3) Local occupancy grid mapping: To improve robust-ness, an odometry-corrected local occupancy grid map isconstructed and updated every iteration with any scan clus-ters not associated with tracked people, on the assumptionthat these are non-human detections. The map covers a20m× 20m square area, at a resolution of 5cm× 5cm/cell,and is centred at the current position of the laser scanner. Bydefault, all cells are assumed to be freespace until updatedwith a non-human observation.

The local map is used to assist with the data association byassigning the maximum cost, costmax, to matchings betweenclusters in occupied space and human tracks. This is helpfulto prevent situations where a tracked obstacle cluster is splitdue to an occlusion, and simultaneously, a nearby person’sleg is occluded, potentially resulting in a match between theperson track and the obstacle cluster. The map is also usedto disallow person track initiations in occupied space, whichcan occasionally arise from non-human objects.

4) Person track initiation and deletion: To keep thenumber of person tracks manageable, initiation conditionsare applied, using the following criteria: (1) a pair of tracksare detected, which move a given distance (0.5m) withoutdrifting apart, (2) both tracks maintain a confidence levelabove a threshold cmin, and (3) both are in freespace(as computed from the local occupancy grid map). Uponinitiating a person track, the person’s position is estimatedto be the mean of the positions of the pair of associated legtracks. Then, the leg tracks are deleted and only one track iskept representing the position of the person.

Finally, human tracks are deleted when their innovationcovariance S is greater than a threshold, or confidence overthe track drops below the threshold cmin.

C. Autonomous Following: Navigation and ControlTo perform person following, a single human track is se-

lected as input to an Object Following Controller (OFC) [23].The track can be manually selected by the robot operator, orautomatically selected using a predefined decision criteria,such as picking the human closest to the robot. The OFCuses the track’s latest state estimate to compute the difference(position error) from the desired position of the human withrespect to the robot (position goal). The position error isthen used to compute the robot’s velocity setpoints for low-level actuator controllers. The motion generated decreasesthe position error over time and results in a trajectory thatcauses the robot to follow the human.

To minimize the position error of the tracked human,the OFC modulates the robot’s angular and linear velocitysetpoints independently. Two vectors are used for thesecalculations: a vector from the robot’s center to the goalposition, and a vector from the robot’s center to the humanposition. The first vector is expected to be constant (as longas the goal position does not change), while the secondvector changes as the human and robot move with respectto each other. The control actions aim to equalize thelength of these vectors, and drive the angle between themto zero. A Proportional-Integral-Derivative (PID) controllerwas implemented to calculate the angular velocity setpointusing the angle between the vectors, while a second PIDcontroller calculates the linear velocity setpoint using thedifference in lengths of the vectors. Both controllers weretuned using classical Ziegler-Nichols method. A dead-bandzone was defined to address vibrations in the control actionswhen the position error is too small.

V. EXPERIMENTAL VALIDATION

A. Benchmark descriptionsSome prior benchmarks for the evaluation of the person

detection and tracking module are available [24], howeverto the best of our knowledge, they are collected from astationary laser scanner and are at a height of 0.8m, whichis above the leg-region of many pedestrians. The method ofground-truth annotation is also ambiguous, as it is unclearunder what conditions a person is labelled as visible oroccluded (an issue described in [25]). Further, the datasetsare not ROS-enabled and the person-detection module usedto detect people with the given laser scanner has not beenreleased, making comparisons difficult on other platforms.

To palliate some of these gaps, and allow thorough eval-uation both of our own and future methods, we collectedand will be publicly sharing two new benchmarks for persondetection and tracking, detailed in Table II. These provide 40minutes of data recorded onboard moving robots, are fullyROS-enabled and include annotated people tracks labelled inthe laser scans using an objective, unambiguous procedure,with video data to corroborate ground-truth when necessary.

1) General multi-person tracking: The first benchmark,called General multi-person tracking is designed for evalu-ating performance of general multi-person tracking of pedes-trians in natural environments. The benchmark is in fact

composed of two datasets, one collected from a stationaryrobot and one collected from a moving robot. The stationarydataset includes 7 minutes of tracking data, while the movingdataset includes 5 minutes of data, and both are fullyannotated, including 82 identified people tracks. The datawas recorded onboard the SmartWheeler in the hallways ofa university campus building during normal opening hours.Each recording includes odometry data published at 100Hzand laser data published at 7.5Hz. Video data was alsocaptured, but was only used to as a reference for annotationsof people in the laser scans, and was not included in the finalcurated benchmarks (though is available on demand).

All ground-truth people positions were hand-labelled ineach laser scan. To address the issue of whether or not aperson should be labelled as visible if they are partiallyoccluded (an issue raised in [25]), a consistent and objectiverule was used: if a minimum of at least three laser pointscan be clustered with a Euclidean distance of 0.13m, andthat cluster of points corresponds to a person, then they aremarked as visible in the annotations.

2) Tracking for following: The second benchmark, calledTracking for following, is designed to measure the perfor-mance of the detection and tracking modules when appliedspecifically to the task of tracking an individual person inthe presence of other pedestrians. This benchmark is alsocomposed of two separate datasets.

In the Following Indoor dataset, the person to be followedwas asked to walk naturally and stop periodically to interactwith objects in the environment while the SmartWheelerfollowed either from behind or side-by-side, simulating aperson-following situation (for data collection in this bench-mark, the robot was manually controlled). The environmentis the same university building as for the General multi-person tracking benchmark, and includes scenes with naturalcrowds and clutter, as well as five instances of pedestrianswalking between the SmartWheeler and the person it wasfollowing. Odometry was collected at a frequency of 100Hzand laser scans at 15Hz. The position of the person beingfollowed was hand-labelled in every frame. Altogether, 21minutes of data were gathered and annotated, including4.5 minutes of side-by-side following and 16.5 minutes offollowing from behind.

The Following Outdoor dataset was collected with aClearpath Husky robot at the Canadaian Space Agency inSaint-Hubert. The laser scanner was mounted level with theground, approximately 40cm high. The dataset contains 12minutes of data, also fully annotated. Laser scans were col-lected at a frequency of 10Hz. Odometry data was collectedas well but was ultimately not used as it was found tobe detrimentally inaccurate. The challenge in this dataset isdealing a high degree of sensor noise, as the laser scannerused is indoor-rated and highly susceptible to interferencefrom the sun. In this case, the robot was controlled au-tonomously to follow the person using the navigation systempresented in Sec. IV-C and an earlier version of the JointLeg Tracker presented in Sec. IV. During the experiment,the tracker failed on two separate occasions and steered the

(a) (b) (c) (d) (e) (f)

Fig. 2. The person tracking and following system implemented on a Clearpath Husky during the collection of the Following Outdoor dataset. The robotautonomously followed the participant in an approximately 500m loop on gravel and grass at the Canadian Space Agency in Saint-Hubert. Two trackingfailures occurred, one of which is shown in (c), which were caused by extensive sensor noise, as the laser scanner used is indoor-rated and highly susceptibleto interference from the sun.

robot such that the person to be tracked was lost out of frame.Fig. 2 shows sample images from this dataset.

B. Comparison Person Tracking Approaches

Quantitatively comparing to existing methods from theliterature is challenging due to a scarcity of publicly availablecode. In our search, we were only able to find one otheropen-source ROS-enabled method which could be comparedto ours: the ROS leg detector package used in [4]. To adaptthe ROS leg detector to our benchmarks, all parameters werekept at their default values except the confidence thresholdwas lowered, as the tracker would commonly fail to initiatepeople tracks with its default value. Also, the clusteringdistance and minimum required points per cluster were setto be the same as for our method.

A variant of our algorithm, which we call the IndividualLeg Tracker, is also included in the comparison. It is identicalto the Joint Leg Tracker described in Sec. IV, except that alltracking is performed on an individual-leg level (not pairs oflegs) and it does not use a local occupancy grid map for dataassociation. In this case, person tracks are deleted when thetwo leg tracks separate beyond a distance threshold, one ofthe leg tracks has too low of a confidence, or one of the legtracks is deleted.

Minor modifications of all methods were made to allowfor repeatable results on the benchmarks. This was necessarybecause the tracking methods require coordinate transforma-tions be made in real-time, which can cause benchmarksresults to vary depending on the exact time these transforma-tions are made. Each method was therefore set to use the scanheader time to perform the transforms and, when they werenot available, to wait for one second and, if they were still notavailable, the current scan was skipped (although, we foundthis happened only very rarely). An option was included ineach method to never wait for transforms to become availablebut instead to always use the most recent ones. This is howthe system is intended to be used in practice and was thevariation used for runtime profiling.

C. Evaluation metrics

1) General multi-person tracking: We use the CLEARMOT metrics for quantitative evaluation of the trackingsystem [26]. They are commonly used metrics for multi-object tracking, and provide scorings of valid assignments,

ID switches, misses, false positives (FPs) and precision ofmatchings cumulated from every frame. These scores canbe aggregated into a combined overall multi-object trackingaccuracy (MOTA) score

MOTA = 1−�

k(IDk +Missk + FPk)�k gk

where IDk, Missk, FPk and gk are number of ID switches,misses, FPs and ground truth annotations respectively, at timek. However, the aggregation of the scorings in the combinedMOTA score assumes that three types of errors (ID switches,misses and FPs) are equally unfavorable, which is arguablynot true in most applications, including ours. We thereforereport the raw count of each error type alongside the MOTAscore.

Another relevant CLEAR MOT metric is the mult-objecttracking precision (MOTP), which is defined as

MOTP =�

i,k dik/

�k ck

where ck is the number of matchings made between esti-mated people positions and ground truth positions at time k

and dik is the distance between the ith match. Intuitively,it provides a measure of how precise a target is trackedwhen it is being tracked properly. In the context of person-following, a lower MOTP score would be beneficial becausea more precise location of the person being followed wouldpresumably allow the following controller to track the personmore precisely.

A threshold distance of 0.75m was used for the CLEARMOT matching to determine whether a tracked person shouldbe matched to an annotated ground truth person.

2) Tracking for following: For evaluation in the Trackingfor following scenarios, the same CLEAR MOT metricsare used. However, since we are only concerned with thetracker’s ability to track a particular person (among several),only the ground-truth positions of the target person werelabelled. Other pedestrians were ignored, and only person-tracking events corresponding to the target person are re-ported (i.e., FPs were ignored because they may have beendue to unlabelled pedestrians).

Under such an evaluation framework, the relevant metricsare ID switches and misses. ID switches represent caseswhere the track of the target person was switched with some-one or something else, and it would no longer be possible to

TABLE IIOVERVIEW OF BENCHMARKS.

Avg. estimated robot Annotated Avg. dist. toBenchmark Dataset Duration speed when moving (m/s) people tracks person followed (m)General multi-person tracking Stationary Robot 7m11s n/a 45 n/a

Moving Robot 5m01s 0.9 37 n/aTracking for following Following Indoor 21m24s 0.9 1 1.33± 0.57

Following Outdoor 12m00s 0.6 1 1.68± 0.30

TABLE IIIBENCHMARK RESULTS.

RuntimeDataset Tracker Valid ID Switch Miss FP MOTA MOTP (m) Worst/Avg (Hz)Stationary Robot leg detector [4] 427 51 1605 28 19.2% 0.28 19/25

Individual Leg 569 10 1504 61 24.4% 0.23 14/25Joint Leg 703 8 1372 11 33.2% 0.16 15/25

Moving Robot leg detector [4] 149 15 481 294 -22.5% 0.27 19/25Individual Leg 160 2 483 88 11.2% 0.17 12/24Joint Leg 163 2 480 97 10.2% 0.15 11/24

Following Indoor leg detector [4] 15425 196 3425 n/a n/a 0.15 18/25Individual Leg 17295 73 1678 n/a n/a 0.13 4/19Joint Leg 18554 7 485 n/a n/a 0.09 4/19

Following Outdoor leg detector [4] 4835 1721 1172 n/a n/a 0.09 22/25Individual Leg 5348 121 819 n/a n/a 0.11 15/25Joint Leg 6073 21 104 n/a n/a 0.09 18/25

autonomously follow them. Misses represent cases where thetarget person was visible in the laser scanner’s field-of-viewbut was not tracked. In such cases, the robot user would notbe able to lock into the target person to initiate following.

D. Results

Results from all datasets are shown in Table III. The JointLeg Tracker achieves the best results in the Tracking forfollowing tasks by a large margin. It suffers from only 7 IDswitches in the 21 minutes of the Following Indoor datasetand makes virtually no ID switch errors in the FollowingOutdoor dataset (assuming the 2 ID switches which werecaused by the person being tracked moving out of framefor a significant amount of time to not be preventable). Italso achieves the lowest MOTP on all datasets, meaning thatwhen a person is being tracked properly, it provides the mostprecise estimate of their location. This would presumablyimprove the performance of the following controller, whichuses the location estimate to perform closed-loop control.

Its performance when applied to the General multi-persontracking benchmark is also favourable but by a lesser margin.It causes significantly less of every type of error on theStationary Robot dataset compared to the others and achievessimilar results to the Individual Leg Tracker on the MovingRobot dataset.

The leg detector is outperformed in almost all respects bythe other tracking methods and is generally unable to trackanyone persistently in the challenging benchmark environ-ments, as is shown by its high number of ID switches in allcases.

Runtime: The average-case runtime of the Joint LegTracker is faster than the maximum scanner frequency in allcases when run on an Intel Core i7 CPU. In the FollowingIndoor dataset, which has frequent open areas containing

many distinct objects to be tracked, the slowest singleupdate is slower than the scan frequency by a factor ofapproximately four, though the average runtime over theentire dataset is still better than real-time. The bottleneck inthese cases tends to be the tracking and data association step,which is implemented in Python. In the future we plan to testa C++ implementation and, if this does not achieve worst-case, real-time performance, limit the number or distance ofclusters which are tracked.

VI. DISCUSSION

This work presents a novel method for detecting, trackingand following people using laser scanners at leg-height. Thesystem integrates a joint leg tracker with local occupancy gridmaps and a method of tracking all scan clusters, includingnon-human clusters, to improve tracking in cluttered areas.Empirically, it has shown to be effective in the targetapplication of autonomous person following and presentsadvantages in general multi-person tracking as well. Thetracking method was tested on the SmartWheeler and alsointegrated with an autonomous following navigation systemdeployed on a Clearpath Husky. All datasets and code arepublicly available on the first author’s website.

Limitations: Since the Joint Leg Tracker was developed inthe ROS framework, it should be relatively straightforward totransfer to other robots with laser scanners at similar heights.One caveat, however, is that the human-confidence learningalgorithm may require re-training for optimal performancewith laser scanners of different resolutions.

Another potential limitation of the presented system isthe GNN data association method. While it is computa-

1Due to the two cases where the person followed was lost out of framefor a significant amount of time, it is reasonable to expect a minimum oftwo ID switches in this dataset.

tionally faster than the MHT, quantitative experiments existdemonstrating better performance of more sophisticated dataassociation methods, such as the JPDAF and the MHT, inother application areas. For example, meta-results reportedby Blackman and Popoli [12] suggest that the JPDAF andthe MHT generally track targets better in high-clutter envi-ronments with many false positive detections when used inthe task of radar tracking. However, quantitative comparisonsspecifically in the area of people tracking in 2D laser scanswould be beneficial because it is a fundamentally differentdomain. For example, false positive detections are producedsystematically by objects in the scan, rather than randomlyover the scanning area, as is often assumed in radar tracking,and can therefore be accounted for explicitly (e.g., as in ourtracker or in Luber et al. [27]).

Evaluation metrics: The overall CLEAR MOT scorespresented in the benchmarks are significantly lower thanthose presented in [24]. However, the results are not directlycomparable because of the different sensor configurationsand environments. For example, the laser sensor used inthe General multi-person tracking benchmark has a lowerscanning frequency (7.5Hz vs 37.5Hz) and a shorter range(8m vs 80m) than the laser sensor from the benchmarksin [24]. The shorter range naturally increased the number ofmisses since pedestrians would enter and leave the field ofview quicker and with fewer detections, and would thereforespend proportionally more time being tracked as non-personsbefore person tracks were initiated.

Impact and future work: The system development and ex-periments completed thus far indicate that we have achieveda portable and reliable system. Moving forward, we intendto verify the usefulness of the system on the SmartWheelerwith the target user population. One goal will be to determineif the system is capable of allowing a smart wheelchair userto comfortably carry on a conversation with a companionwhile autonomously following them side-by-side in busyenvironments. We are also investigating the potential forusing this system to help automatically assess patients’locomotion patterns following an injury [7] in the contextof a rehabilitation therapy.

ACKNOWLEDGEMENTS

The authors would like to thank Benedicte Leonard-Cannon, Gheorghe Comanici, Michael Mills, Qiwen Zhang,Andrew Sutcliffe, Martin Gerdzhev and Chenghui Zhou fortheir gracious help.

This work was supported by the Natural Sciences andEngineering Research Council (NSERC) through the NSERCCanadian Field Robotics Network (NCFRN).

REFERENCES

[1] A. Cosgun, D. A. Florencio, and H. I. Christensen, “Autonomousperson following for telepresence robots,” in Int. Conference onRobotics and Automation (ICRA), 2013.

[2] D. Kairy, P. Rushton, P. Archambault, E. Pituch, C. Torkia, A. ElFathi,P. Stone, F. Routhier, R. Forget, L. Demers, J. Pineau, and R. Gour-deau, “Exploring powered wheelchair users and their caregivers’ per-spectives on potential intelligent power wheelchair use: A qualitativestudy,” Int. Journal of Environmental Research and Public Health,pp. 1–7, 2014.

[3] K. O. Arras, B. Lau, S. Grzonka, M. Luber, O. M. Mozos, D. Meyer-Delius, and W. Burgard, “Range-based people detection and trackingfor socially enabled service robots,” in Towards Service Robots forEveryday Environments, pp. 235–280, 2012.

[4] D. V. Lu and W. D. Smart, “Towards more efficient navigation forrobots and humans,” in Int. Conference on Intelligent Robots andSystems (IROS), 2013.

[5] P. Boucher, A. Atrash, S. Kelouwani, W. Honore, H. Nguyen, J. Ville-mure, F. Routhier, P. Cohen, L. Demers, R. Forget, and J. Pineau, “De-sign and validation of an intelligent wheelchair towards a clinically-functional outcome,” Journal of NeuroEngineering and Rehabilitation,vol. 10, pp. 1–16, 2013.

[6] D. Schulz, W. Burgard, D. Fox, and A. B. Cremers, “People trackingwith mobile robots using sample-based joint probabilistic data associ-ation filters,” Int. Journal of Robotics Research, 2003.

[7] A. Leigh and J. Pineau, “Laser-based person tracking for clinicallocomotion analysis,” IROS Workshop on Rehabilitation and AssistiveRobotics, 2014.

[8] D. Montemerlo, S. Thrun, and W. Whittaker, “Conditional particlefilters for simultaneous mobile robot localization and people-tracking,”in Int. Conference on Robotics and Automation (ICRA), 2002.

[9] E. A. Topp and H. I. Christensen, “Tracking for following and passingpersons.,” in Int. Conference on Intelligent Robots and Systems (IROS),2005.

[10] R. Gockley, J. Forlizzi, and R. Simmons, “Natural person-followingbehavior for social robots,” in Int. conference on Human-robot Inter-action, 2007.

[11] S. Hemachandra, T. Kollar, N. Roy, and S. Teller, “Following andinterpreting narrated guided tours,” in Int. Conference on Roboticsand Automation (ICRA), 2011.

[12] S. S. Blackman and R. Popoli, Design and analysis of modern trackingsystems. Artech House, 1999.

[13] L. E. Navarro-Serment, C. Mertz, N. Vandapel, and M. Hebert, “Ladar-based pedestrian detection and tracking,” in IEEE Workshop on HumanDetection from Mobile Platforms, 2008.

[14] M. Munaro and E. Menegatti, “Fast rgb-d people tracking for servicerobots,” Autonomous Robots, 2014.

[15] A. P. Gritti, O. Tarabini, J. Guzzi, G. A. D. Caro, V. Caglioti,L. M. Gambardella, and A. Giusti, “Kinect-based people detectionand tracking from small-footprint ground robots,” in InternationalConference on Intelligent Robots and Systems (IROS), 2014.

[16] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach tocollision avoidance,” IEEE Robotics & Automation Magazine, 1997.

[17] M. Kobilarov, G. Sukhatme, J. Hyams, and P. Batavia, “Peopletracking and following with mobile robot using an omnidirectionalcamera and a laser,” in Int. Conference on Robotics and Automation(ICRA), 2006.

[18] L. Breiman, “Random forests,” Machine learning, 2001.[19] G. Bradski, “The opencv library,” Doctor Dobbs Journal, vol. 25,

no. 11, pp. 120–126, 2000.[20] R. Kalman, “A new approach to linear filtering and prediction prob-

lems,” Journal of Basic Engineering, 1960.[21] H. W. Kuhn, “The hungarian method for the assignment problem,”

Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.[22] P. Konstantinova, A. Udvarev, and T. Semerdjiev, “A study of a target

tracking algorithm using global nearest neighbor approach,” in Int.Conference on Computer Systems and Technologies, 2003.

[23] N. A. Olmedo, H. Zhang, and M. Lipsett, “Mobile robot systemarchitecture for people tracking and following applications,” in Int.Conference on Robotics and Biomimetics (ROBIO), 2014.

[24] M. Luber and K. O. Arras, “Multi-hypothesis social grouping andtracking for mobile robots.,” in Robotics: Science and Systems, 2013.

[25] A. Milan, K. Schindler, and S. Roth, “Challenges of ground truthevaluation of multi-target tracking,” in Int. Conference on ComputerVision and Pattern Recognition Workshops (CVPRW), 2013.

[26] B. Keni and S. Rainer, “Evaluating multiple object tracking perfor-mance: the clear mot metrics,” EURASIP Journal on Image and VideoProcessing, 2008.

[27] M. Luber, G. D. Tipaldi, and K. O. Arras, “Place-dependent peopletracking,” Int. Journal of Robotics Research (IJRR), 2011.

person tracking and following with 2d laser scannersjpineau/files/leigh-icra15.pdf · this approach...

Documents