human body pose estimation using silhouette shape analysisamittal/bodypose.pdf · human body pose...

8
Human Body Pose Estimation Using Silhouette Shape Analysis Abstract We describe a system for human body pose estimation from multiple views that is fast and not dependent on a rigid 3D model. We make use of recent work in decom- position of a silhouette into 2D parts. These 2D part primitives are matched across views to build assem- blies in 3D. In order to search for the best assembly, we use a likelihood function that integrates informa- tion available from multiple views about body part lo- cations. Occlusion is modeled into the likelihood func- tion so that the algorithm is able to work in a crowded scene even when only part of the person is visible in each view. The algorithm has potential applications in surveillance and promising results have been ob- tained. 1 Introduction Determining the pose of humans is an important prob- lem in vision and has many applications. In this pa- per, we target multi-camera surveillance applications where one wants to recognize the activities of people in a scene in the presence of occlusions and partial oc- clusions. One cannot assume that a person is visible in isolation or in full in either one or all of the views. Nor can one assume that we have a model of the per- son, or that the initial body pose is known. Such a system should also be reasonably fast. However, very accurate body pose values are typically not required, and an answer close to the actual body pose might be adequate. We describe an algorithm that can form the basis of such a surveillance system. Our system estimates the 3D pose of a human body from multiple views. We make use of recent work in decomposition of a silhouette into 2D parts. These 2D part primitives are matched across views to build prim- itives in 3D which are then assembled to form a human figure. In order to search for the best assembly, we use a likelihood function that integrates information avail- able from multiple views about body part locations. Greedy search strategies are employed so as to find the best assembly fast. 1.1 Related Work Human Body pose estimation has received consider- able interest in the past few years and several ap- proaches have been tried for different applications. There are many methods for incremental model- based body part tracking where a model of an artic- ulated structure (person) is specified upfront[5, 20, 1, 21, 6]. Delamarre and Faugeras [5] try to align the pro- jection of an articulated structure with the silhouettes of a person obtained in multiple views by calculating forces that need to be applied to structure. Drummond and Cipolla [20] use Lie algebra to incrementally track articulated structures. Bregler and Malik [1] use twists and exponential maps to specify relationships between parts and to track an articulated structure incremen- tally. Sidenbladh[17] and Choo [4] use monte carlo particle filtering to incrementally update the posterior probabilities of pose parameters. These methods need to have both a 3D model of the human structure and a good initialization and have potential applications in motion-capture [6]. Another class of algorithms [12, 8, 18, 15] try to detect body parts in 2D using template matching and then try to find the best assembly using some criteria. Some other methods learn some models of human mo- tion. These models can be based on optical flow [7], exemplars [14, 19], feature vectors [18], support vec- tor machines [15], or statistical mappings (SMA) [16]. These models can then be used to detect and estimate the pose of a human in an observed image. Our work is most closely related to the work of Kakadiaris and Metaxas [11] who try to acquire 3D body part information from silhouettes extracted in or- 1

Upload: others

Post on 21-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • HumanBody PoseEstimationUsingSilhouetteShapeAnalysis

    Abstract

    We describe a system for human body pose estimationfrom multiple views that is fast and not dependent on arigid 3D model. We make use of recent work in decom-position of a silhouette into 2D parts. These 2D partprimitives are matched across views to build assem-blies in 3D. In order to search for the best assembly,we use a likelihood function that integrates informa-tion available from multiple views about body part lo-cations. Occlusion is modeled into the likelihood func-tion so that the algorithm is able to work in a crowdedscene even when only part of the person is visible ineach view. The algorithm has potential applicationsin surveillance and promising results have been ob-tained.

    1 Intr oduction

    Determiningtheposeof humansis animportantprob-lem in vision andhasmany applications. In this pa-per, we target multi-camerasurveillanceapplicationswhereonewantsto recognizetheactivities of peoplein ascenein thepresenceof occlusionsandpartialoc-clusions. Onecannotassumethat a personis visiblein isolationor in full in eitheroneor all of theviews.Nor canoneassumethatwe have a modelof theper-son, or that the initial body poseis known. Suchasystemshouldalsobereasonablyfast. However, veryaccuratebody posevaluesare typically not required,andananswercloseto theactualbodyposemight beadequate.We describeanalgorithmthatcanform thebasisof suchasurveillancesystem.

    Oursystemestimatesthe3D poseof ahumanbodyfrom multiple views. We make useof recentwork indecompositionof asilhouetteinto 2D parts.These2Dpartprimitivesarematchedacrossviewsto build prim-itivesin 3D whicharethenassembledto form ahuman

    figure. In orderto searchfor thebestassembly, weusea likelihoodfunctionthatintegratesinformationavail-able from multiple views aboutbody part locations.Greedysearchstrategiesareemployedsoasto find thebestassemblyfast.

    1.1 RelatedWork

    HumanBody poseestimationhasreceived consider-able interest in the past few years and several ap-proacheshave beentried for differentapplications.

    There are many methodsfor incrementalmodel-basedbody part tracking wherea modelof an artic-ulatedstructure(person)is specifiedupfront[5, 20, 1,21, 6]. DelamarreandFaugeras[5] try to alignthepro-jectionof an articulatedstructurewith the silhouettesof a personobtainedin multiple views by calculatingforcesthatneedto beappliedto structure.DrummondandCipolla[20] useLie algebrato incrementallytrackarticulatedstructures.BreglerandMalik [1] usetwistsandexponentialmapsto specifyrelationshipsbetweenpartsand to track an articulatedstructureincremen-tally. Sidenbladh[17] and Choo [4] usemontecarloparticlefiltering to incrementallyupdatetheposteriorprobabilitiesof poseparameters.Thesemethodsneedto have both a 3D modelof the humanstructureandagoodinitialization andhave potentialapplicationsinmotion-capture[6].

    Another classof algorithms[12, 8, 18, 15] try todetectbody partsin 2D usingtemplatematchingandthentry to find thebestassemblyusingsomecriteria.Someothermethodslearnsomemodelsof humanmo-tion. Thesemodelscanbe basedon optical flow [7],exemplars[14, 19], featurevectors[18], supportvec-tor machines[15], or statisticalmappings(SMA) [16].Thesemodelscanthenbeusedto detectandestimatetheposeof ahumanin anobservedimage.

    Our work is most closely relatedto the work ofKakadiarisand Metaxas[11] who try to acquire3Dbodypartinformationfrom silhouettesextractedin or-

    1

  • thogonalviews. They employ a deformablehumanmodelsothatany sizeof thehumancanberecognized.Thedistinguishingfeatureof ourwork is thatit is ableto work in a crowdedscenesothat in all of theviews,the personmight be fully or partially occluded.Thisis accomplishedby explicitly modelingocclusionanddeveloping prior modelsfor personshapesfrom thescene.This helpsusto decoupletheproblemsof poseestimationfor multiple peopleso that the degreesoffreedomof theproblemaredecreasedsubstantially.

    The paperis organizedas follows. Section2 de-scribesthe methodof extraction of silhouettesin acrowdedscene.Section3 describesshapeanalysisofsilhoettesand matchingpartsacrossviews to obtain3D partprimitives. Section4 describesthe likelihoodfunction usedfor assemblyevaluation. Section5 de-scribesthe algorithmusedto find the bestassembly.We concludewith somepreliminaryresultsin section6.

    2 Extracting Multiple Silhouettes ina Clutter ed Scene

    Weusethemethoddevelopedby Mittal andDavis [13]in theirsystemM � Tracker for extractingsilhouettesofpeoplein aclutteredscene.Themethodis ableto seg-mentregionsbelongingto differentpeopleevenwhenthey arenotvisually isolated.Here,weprovideabriefreview of themethod.

    M � Tracker developstwo typesof modelsfor eachperson.

    2.1 AppearanceModels

    2.1.1 Color Models

    A probabilisticmodelfor thecolor distribution at dif-ferent heightsof the personis developedusing themethodof non-parametricGaussiankernelestimation.

    2.1.2 “Presence”Probabilities

    The otherattribute modeledis the “Presence”Proba-bility (denotedby

    ���������

    ), definedastheprobability

    thatapersonis present(i.e.occupiesspace)atheight�

    anddistance�

    from thevertical line passingthroughtheperson’s center.

    Figure1: SamplePresenceProbabilitiesof people.

    Thesemodelsaredevelopedautomaticallyfrom thesceneandareusedto segmentimagesin thefollowingway.

    2.2 Pixel Classification

    BayesianClassificationis usedto classifyeachpixelasbelongingto aparticularperson,or thebackground.Thea posteriori probabilitythatanobservation � ��� atpixel

    originatedfrom person� (or the background)

    is

    ������������������� �! "� ��#�%$&�('�)������ � � ��*� � ��# +� (1)Thepixel is thenclassifiedas

    Most likely class,&-/.103254+67 ���8������������������ �9 "� ���� (2)�*� � ��� +� is given by the color modelof the personat height

    �. For thebackground,a backgroundmodel

    of thesceneis used.Thepriorsincludeocclusioninformationanddeter-

    minedusingthe following method.For eachpixel

    ,a ray is projectedin spacepassingthroughthe opti-cal centerof the camera. Minimum distances

    � 7 ofthis ray arecalculatedfrom the vertical lines passingthroughthecurrentlyestimatedcentersof thepeople.Also calculatedaretheheights

    � 7 of theshortestlinesegmentsconnectingtheselines.Then,theprior prob-ability that a pixel

    is the imageof person� is set

    as

    � '�)������ � � , � 7 ��� 7 �� 7

    k occludes j

    ��:@���A>!��!�

    � '�)������ ��B+CED 0F.1GIH ,all j

    ��:(;=� 7 ��� 7 �� 7 � (3)

    2

  • Figure2: Someresultsfrom M � Tracker. Thefirst twoimagesshow detectionandtrackingresultsandthelasttwo show segmentationresults.

    where� 7 ��� 7 �� 7 is the “presence”probability de-

    scribedearlier. A person“D

    occludes� ” if thedistanceofD

    to theopticalcenterof thecamerais lessthanthedistanceof � to the center. The classificationproce-durehelpsto incorporateboth thecolor profile of thepeople,andtheocclusioninformationavailable.

    Thesegmentationalgorithmassumesknowledgeofapproximatepersonlocations.Theselocationsareob-tainedusinga region-basedstereoalgorithm.

    2.3 Obtaining Multiple Segmentations

    Thereareseveral parametersin the segmentational-gorithm. Accurateextractionof differentpartsof thepersonrequiresdifferentparameters.Therefore,it isessentialto vary theparameterssoasto obtainmulti-plesegmentations.Theparametersthatwevary are(1) therelative weightgivento thebackgroundmodel,(2) the relative weight given to different foregroundobjectssothatdifferentobjectsarehighlighted,and(3) thethresholdfor deteminingwhetherapixel is un-classifiedpixels.Thesilhouettesthusobtainedaresegmentedusingthemethoddescribedin thenext section.

    Figure3: Multiple SegmentationsObtainedfor theim-ageshown in thefirst image

    3 Computing Body-part Primiti ves

    3.1 2D SilhouetteShapeAnalysis

    In orderto recover theposeof a person,we breakthesilhouetteof the personinto parts. According to hu-man intuition aboutparts,a segmentationinto partsoccursat negative minima of curvature so that thedecomposedparts are convex regions. Singh et al.notedthatwhenboundarypointscanbejoinedin morethanonewayto decomposeasilhouette,humanvisionprefersthepartitioningschemewhichusestheshortestcuts( A cut is theboundarybetweenapartandtherestof thesilhouette).They furtherrestrictacut to crossasymmetryaxis in orderto avoid shortbut undesirablecuts.However, mostsymmetryaxesarevery sensitiveto noiseand are expensive to compute. In contrast,we usetheconstrainton thesalienceof apartto avoidshortbut undesirablecuts.Accordingto HoffmanandSingh’s [10] study thereare threefactorsthat affectthesalienceof apart: thesizeof thepartrelative to thewhole object, the degreeto which the part protrudes,andthestrengthof its boundaries.Amongthesethreefactors,thecomputationof a part’s protrusion(thera-

    3

  • Figure4: SilhouetteDecomposition

    tio of the perimeterof the part (excluding the cut) tothe lengthof the cut) is more efficient and robust tonoiseandpartialocclusionof theobject.Thus,weem-ploy theprotrusionof aparttoevaluateitssalience;thesalienceof a partincreasesasits protrusionincreases.

    In summary, we combinetheshort-cutrule andthesaliencerequirementto constrainthe other end of acut. For examplein Figure3.1,let J beasilhouette,Kbetheboundaryof J , � beapointon K with negativeminimaof curvature,and

    �8Lbeapointon K sothat �

    and��L

    divide theboundaryK into two curves KNM , K �of equalarclength.Thentwo cutsareformedpassingthroughpoint

    �:�*� M , �*�8� suchthatpoints � M and ���

    lieson KNM and K � , respectively. Theends� M and ��� ofthetwo cutsarelocatedasfollows:

    � M?,&-/.10325O PQSR*T �U�WV Ts.t. T

    X�*� V TT �*�WV T

    Y[Z �"�\� V^] KNM � �*� V ] J (4)

    ��� ,_-`.10325O PQSR*T �*�WV Ts.t. T

    X�U� V TT �U�WV T

    Y[Z � �\� V ] K � � �*� V ] J (5)

    whereX�*� V

    is thesmallerpartof boundaryK between�and

    � V, T

    X�*� V T is thearclengthofX�*� V

    , and abQ^Q R aa Q^QSR a is

    thesalienceof thepartboundedby curveX�U� M andcut�*� M .

    Eq. (4) meansthat point� M is locatedso that the

    cut�*� M is theshortestoneamongall cutssharingthe

    Pl P

    r

    Cl

    Cr

    Pm

    P

    Figure5: Computingthecutspassingthroughpoint P

    sameend�

    , lying within thesilhouettewith theotherendlying on contour KNM , andresultingin a significantpartwhosesalienceis above a thresholdZ � . Theotherpoint

    �8�is locatedin thesamewayusingEq. (5).

    Since negative minima of curvature are obtainedby local computation,their computationis not robustin real digital images. We take several computation-ally efficient strategies to reducethe effectsof noise.First, a B-splineapproximationis usedto moderatelysmooththe boundaryof a silhouette,sinceB-splinerepresentationis stableandeasyto manipulatelocallywithout affecting the restpart of the silhouette.Sec-ond,thenegativeminimaof curvaturewith smallmag-nitudeof curvatureareremoved to avoid partsduetonoiseor small local deformations. However, curva-tureis not scaleinvariant(e.g. its valuedoublesif thesilhouetteshrinksby half). Oneway to transformcur-vatureinto ascale-invariantquantityis to first find thechordjoining thetwo closestinflectionswhich boundthe point, thenmultiply the curvatureat the point bythelengthof thischord.Theresultingnormalizedcur-vaturedoesnot changewith scale— if thesilhouetteshrinksto half size,thecurvaturedoublesbut thechordhalves,sotheirproductis constant.

    This analysisyields 2D body partsfor a personina singleview. The torsois not found directly by thismethodasthe body part segmentationscanonly findprotrudedpartsreliably. Sincetheseprotrudedpartscan overlap, thereare a large numberof torsosthatcanbeformedfrom theremainingpartof thesilhoette.Therefore,wedonotattemptto find thetorsosdirectlyandsimply infer it from theotherbodyparts.

    Zhao [22] has useda similar methodto developa systemfor body part identification from a singleview. However, bodypart identificationfrom a singleview is very difficult andlabelingsareoftenincorrect,

    4

  • Figure6: Multiple Body partsobtainedusingtheseg-mentationsshown in Fig. 3

    especiallyin the caseof partial-occlusionsand self-occlusionswheresomebodypartsarenotvisible. Theproblemis alsounderconstrainedsincedepthinforma-tion is not available. Another difficulty is that theirsystemrequiresextractionof goodsilhouetteswhicharenot easyto obtain in a densescene.As opposedto Zhao’s work, we usemultiple camerasandidentifybodyposein 3D usingaglobalanalysis.

    3.2 Computing Body-part Primiti vesin 3D

    2D partprimitivesobtainedusingSilhouetteAnalysisareusedtoobtainpartprimitivesin 3D.First,partsthatare relatively closeto eachother are combinedwitheachother. Second,thedecomposedpartsarematchedacrossviewsusingepipolargeometryto yield 3D bodyparts. The two endpointsof a part in one view arematchedto the correspondingendpointsin the otherview. The matchingis basedon simply lying on thecorrespondingepipolarline. An additionalconstraintthatcanbeusedis thecolor profile of thebodyparts.Thedisadvantageis thatif theviewpointsaresubstan-tially different,thecolorprofilescanvarysignificantly.Also, thecolor profilesfor differentbodypartscanbe

    verysimilar(for e.g.thetwo legscanhaveverysimilarcolorprofiles.)

    Oncematchingis done,a certainnumberof bodypartsareselectedbasedon their matchingscoreandtheirendpointsareprojectedin spaceto yield3D bodyparts.

    4 AssemblyEvaluation using the Ob-servation Lik elihood

    Labelingsareassignedto these3D partsby buildinganassemblythathasthemaximumlikelihoodaccord-ing to an appropriatelikelihood function. From thesetof 3D body parts,we form setsof possibleheads,handsandlegs basedon sizeconstraints.Additionalknowledge,if available,canbeused.Suchinformationmight consistof theknowledgethat thelegsarecloseto thefloor or thatthepersonis standing(constraintonheadandhandpositions).Then,theproblemreducesto findingahead,two hands(or asingleor nohands,ifnot found)andtwo legs(or 0 or 1 legs),suchthat theassemblyhasthe highestlikelihood. The likelihoodfunctionwe useis describedin thenext section.

    4.1 Observation Lik elihood

    In orderto evaluateaparticularassemblyc , wedeter-mine the observation likelihood

    � . � �`d � � � �\e�e�e�� ��fg `c ,which is the likelihood of observing images�`d � � � �\e�e�e�� ��f given the particular assembly c .Assuming that assemblieshave equal priors, theassemblyhaving the highest likelihood is also theassemblywith thehighestposterior. Sincewe do notknow the body poseof otherpeoplein the scene,theobservation likelihood cannot be determinedunlessthe problemsof body posedeterminationof differentpeople are coupled with one another. This leadsto an exponential increasein the complexity of thealgorithm.

    We can decouplethe problem, however, if makesomesimplifying assumptions.Specifically, we canusethemethoddevelopedin M � Tracker[13] to deter-mine priors using presenceprobabilities. Then, thegeneralformula for the observation probability at aparticularpixel

    canbewrittenas:

    h � � ��#� , 7�������i���j� � �� . � � ��# +� (6)

    5

  • Figure7: DeterminingtheProjectionof anAssembly

    wherethe summationis doneover all persons� andthe background,and � ��# is the observation at pixel

    . If the locationof the assemblyis given, the func-tion

    � 7 ��# (PresenceProbability defined in section2.1.2)for thepersonunderconsiderationchangesfromaprobabilisticto afixedfunctionsothat:

    � 7 ��# ,:

    if assembly projects to pixel xkif it does not project to pixel x

    (7)

    Using this definition, onecanredeterminethe pri-ors for all peopleusingequation(3) andcalculatetheobservationprobabilityusingequation(6). Thiswouldbetheconditionalprobability

    � . � � ��# `c . Assumingthat observationsat different pixels are independent,the overall observation probability is thensimply theproductof the observation probabilitiesat eachpixelin eachview.

    � . � �`d � � � �\e�e�e�� ��f8 `c ,f�ml d all pixelsx

    � . � � ��� `c

    (8)In orderto determinetheprojectionof theassembly

    onanimage,wemodelthehandsandlegsascylinderswith approximatewidthsandtheheadasa sphereanddeterminetheirprojectionsontoaview (Figure7). Thetorsois built by filling in thepolygonformedby takingthe joint locationsof the (five) partsas the vertices.More accurateprojectioncanbeformedby building a3D structurebasedon the joint locationsandfindingits projectionontotheviews. Thatwill, however, addto therunningtime of thealgorithm.

    M � Tracker determinesthe probability � . � � ��# +�

    usedin equation(6) using color modelsat different

    height slices. This puts only occupancy constraintson thelikelihood.However, apartfrom thehypothesisthat the given assemblyprojectsto a particularpixel,we alsohave informationas to which part of the as-semblyprojectsto the pixel. Using this information,we canimprove resultsby including in the likelihoodfunction information available from the views aboutpossiblebody part locations. For e.g., we might beableto find theheadusinga facedetector. If we havea skin detector, we might want to exclude the torsofrom the setof body partsthat cangive rise to it. Inthepresentwork, we includeanadditionaltermin thelikelihood

    � . � � ��� +� .First, we determinetheprobability thata particular

    bodyparthasa particularaspectratio� .�n � � -/. . This

    probabilityis modeledasa1D Gaussian,its meanandstandarddeviation learntusingtrainingdata.Now, weconsiderbody partsdetectedfrom the silhouettesex-tractedfor thepersonandfind theiraspectratios.Find-ing thevalueof thefunction

    � .�n � � -/. , we assignthisvalueto all pixels belongingto thepart in thesilhou-ette. Sincethe torsois not observed directly, we can-not determinethis probability for pixels belongingtoit andhencethey areassigneda constantvalue.Sincewehave multiplesilhouettesandhencemultiple prob-ability estimatesfor the aspectratio at a given pixel,weaveragethemto yield asingleresult.For pixelsly-ing outsideany silhouette,theprobabilityis zero.Thiswill yield thefunction

    � . � -/.o B h for eachpixel andeachbodypart

    B h . During evaluationof anassembly,we can computethe value of this function sinceweknow theprojectionsof thebodypartsontotheimage.Thisprobabilityvaluecanbemultipliedwith thecolorlikelihoodto yield thelikelihoodfunction

    � . � � ��� +�

    usedin equation(6).

    5 Searching for the Optimal Assem-bly

    We believe that the bestassemblycanonly be foundby an exhaustive searchin p � GSq time (where Gsrp ��: k ) is the numberof possibleprimitives for eachpart). However, in practive, we have found that thesameresultcanbe obtainedin p � G time if we havea goodinitial estimateof thebodypartpositions, andin p � G � time duringtheinitialization phase.We firstdescribetheincrementalscheme.

    6

  • Figure8: Schematicfor theInitializationprocedure

    5.1 IncrementalAlgorithm

    If we have a sufficiently goodestimateof the currentbody part locations,we usea greedyapproach.Theideais to first try to replaceeachpart with candidateparts. If the assemblywith the original part has ahigher likelihood than the oneswith any of the newprimitives,we keeptheoriginal one. This is repeatedfor differentparts.Wehave foundthat,apartfrom be-ing very fast,this methodyields thebestresults(bet-ter thaninitializationmethod)sinceit is oftenthecasethatsomebodypartshavenogoodcandidatesatapar-ticular time step,in which casewe cankeepthe oldestimate.

    5.2 Initialization

    In order to find an initial solution, or reinitialize themethodif the incrementalmethodfails, we use thefollowing approach. First, we try to find good legpairs. We find K bestpairs (in p ��t � time) basedon thelikelihoodfunctionby building anassemblyofjust the two legs. Similarly, we find K bestpairs ofhands.Next, we find K bestassembliesconsistingoftwo handsandtwo legs usingthe handandleg pairsfoundearlier(Figure8). For thisstep,weconstructthetorsousingthefour joint locations.Finally theheadisaddedandthe bestassemblyis found. Although thismethoddoesnot find the optimal assembly, we havefoundthatit is extremelyeffective in practiceandwiththe right choiceof

    t, yields resultsvery closeto an

    Figure9: Resultsof thealgorithmfor a personat a partic-ular time instantfrom multiple perspectives.Notehow theperson’sbodypartsarecorrectlydetectedeventhoughheispartiallyoccludedfrom someviews.

    exhaustive search.If computationalcost is available,we canfind the

    resultusingbothalgorithms,takingtheassemblywiththehigherlikelihoodastheanswer.

    6 Results

    We have obtainedpromisingresultsfor thealgorithm.We testedour algorithmon a 5-perspective sequencewith two peoplepartially occludingeachotherin sev-eralviews. Wewereableto correctlyidentify thebodypartsof thepeoplewhenthey wereextendedfrom thebody. Whenthepartswerecloseto thebody, thealgo-rithm labeledthepartasmissingandcorrectlyidenti-fied theotherparts.Figure9 shows theresultobtainedataparticulartime instantfor thesequence.Figure10shows theresultsover time from aparticularview. Noinitializationwasdone,norany exact3D modelof thepersonspecified.Thealgorithmtook about10s/frameon a Dual 933MHzPentiumIII processorwheremostof the time was spentin evaluatingdifferent assem-blies.

    7 Summary and Conclusions

    Wehavepresentedanalgorithmfor bodyposeestima-tion thatdoesnot requireany initializationsor modelsto bespecifiedupfrontandis ableto work in acrowded

    7

  • Figure10: Resultsfor five framesof thesequence.

    sceneso that occlusions- both full and partial - arepresent.Thesefeaturesmake it especiallyuseful formany surveillanceapplications.In thefuture,we wishto investigatemore cuesfor body parts in an image(otherthanthesilhouettes)like edgemapsandtextureregions,which might helpus to reducethenumberofcamerasrequiredto obtainacertainquality of results.

    References

    [1] C. Bregler andJ. Malik. Trackingpeoplewith twists andexponentialmaps.In IEEE Conference on Computer Visionand Pattern Recognition, 1998.

    [2] Q. CaiandJ.K.Aggarwal. Trackinghumanmotionin struc-turedenvironmentsusinga distributed-camerasystem.Pat-tern Analysis and Machine Intelligence, 21(11):1241–1247,November1999.

    [3] G.K.M. Cheung,T. Kanade,J.Y. Bouguet,andM. Holler. Areal-timesystemfor robust3dvoxel reconstructionof humanmotions.In CVPR, pages714–720,2000.

    [4] Kiam ChooandDavid J. Fleet. Peopletrackingusinghy-brid montecarlo filtering. In International Conference onComputer Vision, 2001.

    [5] QuentinDelamarreandOlivier D. Faugeras.3d articulatedmodelsandmulti-view trackingwith silhouettes. In ICCV(2), pages716–721,1999.

    [6] D. DiFranco,T.-J.Cham,andJ.M.Rehg. Reconstructionof3-dfiguremotionfrom 2-d correspondences.In IEEE Com-puter Vision and Pattern Recognition, Kauai, Hawaii, De-cember2001.

    [7] R. FabletandM.J. Black. Automaticdetectionand track-ing of humanmotion with a view-basedrepresentation.InECCV02, pageI: 476ff., 2002.

    [8] P. Felzenszwalb andD. Huttenlocher. Efficient matchingofpictorial structures.In Computer Vision and Pattern Recog-nition, pages66–75,2000.

    [9] G.Gavrila andL. Davis. 3dmodel-basedtrackingof humansin action: a multi-view approach. In CVPR, pages73–80,1996.

    [10] DonaldD. Hoffman andManishSingh. Salienceof visualparts.Cognition, 63:29–78,1997.

    [11] KakadiarisI.A. andD. Metaxas.Three-dimensionalhumanbodymodelacquisitionfrom multiple views. InternationalJournal of Computer Vision, 30(3):191–218,1998.

    [12] Sergey Ioffe andDavid Forsyth. Humantrackingwith mix-turesof trees.In International Conference on Computer Vi-sion, pages690–695,2001.

    [13] A. Mittal andL.S.Davis. M u tracker: A multi-view approachto segmentingandtrackingpeoplein a clutteredsceneusingregion-basedstereo.In European Conference on ComputerVision, pageI: 18 ff., 2002.

    [14] G.Mori andJ.Malik. Estimatinghumanbodyconfigurationsusingshapecontext matching.In ECCV02, pageIII: 666ff.,2002.

    [15] R. Ronfard, C. Schmid,andB. Triggs. Learningto parsepicturesof people.In ECCV02, pageIV: 700ff., 2002.

    [16] R.Rosales,M. Siddiqui,J.Alon, andS.Sclaroff. Estimating3d bodyposeusinguncalibratedcameras.In IEEE Interna-tional Conference on Computer Vision and Pattern Recogni-tion, 2001.

    [17] Hedvig Sidenbladh,Michael J. Black, and David J. Fleet.Stochastictrackingof 3dhumanfiguresusing2d imagemo-tion. In ECCV (2), pages702–718,2000.

    [18] YangSong,Xiaolin Feng,andPietroPerona.Towardsde-tectionof humanmotion. In CVPR, pages810–817,2000.

    [19] J.SullivanandS.Carlsson.Recognizingandtrackinghumanaction. In ECCV02, pageI: 629ff., 2002.

    [20] T.Drummondand RobertoCipolla. Real-timetracking ofthemultiple articulatedstructuresin multiple views. In Eu-ropean Conference on Computer Vision, 2000.

    [21] M. Yamamoto,A. Sato,S.Kawada,T. Kondo,andY. Osaki.Incrementaltrackingof humanactionsfrom multiple views.In IEEE International Conference on Computer Vision andPattern Recognition, 1998.

    [22] LiangZhao.Dressed Human Modeling, Detection, and PartsLocalization. PhDthesis,RoboticsInstitute,Carnegie Mel-lon University, Pittsburgh,PA, July2001.

    8