![Page 1: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/1.jpg)
1
Scene Understandingperception, multi-sensor fusion, spatio-temporal reasoning
and activity recognition.
Francois BREMOND
PULSAR project-team,
INRIA Sophia Antipolis, FRANCE
http://www-sop.inria.fr/pulsar/
Key words: Artificial intelligence, knowledge-based systems,
cognitive vision, human behavior representation, scenario recognition
![Page 2: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/2.jpg)
2
• ETISEO: French initiative for algorithm validation and knowledge acquisition: http://www-sop.inria.fr/orion/ETISEO/
• Approach: 3 critical evaluation concepts• Selection of test video sequences
• Follow a specified characterization of problems• Study one problem at a time, several levels of difficulty• Collect long sequences for significance
• Ground truth definition• Up to the event level• Give clear and precise instructions to the annotator
• E.g., annotate both visible and occluded part of objects• Metric definition
• Set of metrics for each video processing task• Performance indicators: sensitivity and precision
Video Understanding: Performance Evaluation (V. Valentin, R. Ma)
![Page 3: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/3.jpg)
3
Evaluation : current approach(AT. NGHIEM)
• ETISEO limitations:• Selection of video sequence according to difficulty levels is subjective• Generalization of evaluation results is subjective.• One video sequence may contain several video processing problems at many
difficulty levels
• Approach: treat each video processing problem separately• Define a measure to compute difficulty levels of input data (e.g. video sequences)• Select video sequences containing only the current problems at various difficulty
levels• For each algorithm, determine the highest difficulty level for which this algorithm
still has acceptable performance.
• Approach validation : applied to two problems• Detect weakly contrasted objects• Detect objects mixed with shadows
![Page 4: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/4.jpg)
4
Evaluation : conclusion • A new evaluation approach to generalise evaluation results.
• Implement this approach for 2 problems.
• Limitations: only detect the upper bound of algorithm capacity.
• The difference between the upper bound and the real performance may be significant if:
• The test video sequence contains several video processing problems
• The same set of parameters is tuned differently to adapt to several concurrent problems
• Ongoing evaluation campaigns:• PETS at ECCV2008
• TRECVid (NIST) with ILids video
• Benchmarking databases:• http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG363
• http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html
![Page 5: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/5.jpg)
5
Video Understanding: Program Supervision
![Page 6: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/6.jpg)
6
Goal : easy creation of reliable supervised video understanding systems
Approach• Use of a supervised video understanding platform
• A reusable software tool composed of three separate components: program library – control – knowledge base
• Formalize a priori knowledge of video processing programs• Explicit the control of video processing programs
Issues ?• Video processing programs which can be supervised• A friendly formalism to represent knowledge of programs• A general control engine to implement different control strategies• A learning tool to adapt system parameters to the environment
Supervised Video Understanding:Proposed Approach
![Page 7: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/7.jpg)
7
Control
ApplicationDomain Expert
Video ProcessingExpert
Application domainknowledge base
Scene environmentknowledge base
Video processing programknowledge baseLearning
Evaluation
ParticularSystem
Evaluation
Video Processing
ProgramLibrary
Proposed Approach
![Page 8: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/8.jpg)
8
• Use of an operator formalism [Clément and Thonnat, 93] to represent knowledge of video processing programs
• Composed of frames and production rules• Frames: declarative knowledge
• Operators: abstract model of a video processing program– primitive: particular program– composite: particular combination of programs
• Production rules: inferential knowledge• Choice and optional criteria• Initialization criteria• Assessment criteria• Adjustment and repair criteria
Supervised Video Understanding Platform:Operator Formalism
![Page 9: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/9.jpg)
9Program Supervision: Knowledge and Reasoning Primitive operator
FunctionalityCharacteristicsInput dataParametersOutput dataPreconditionsPostconditionsEffects
Calling syntax
Rule Bases
Parameter initialization rulesParameter adjustment rulesResult evalutation rulesRepair rules
Composite operator
FunctionalityCharacteristicsInput dataParametersOutput dataPreconditionsPostconditionsEffectsDecomposition into suboperators(sequential, parallel, alternative)Data flow
Rule bases
Parameter initialization rulesParameter adjustment rules Choice rulesResult evalutation rulesRepair rules
![Page 10: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/10.jpg)
10
• Objective: a learning tool to automatically tune algorithm parameters with experimental data
• Used for learning the segmentation parameters with respect to the illumination conditions
• Method• Identify a set of parameters of a task
• 18 segmentation thresholds • depending on environment characteristics
• Image intensity histogram
• Study the variability of the characteristic• Histogram clustering -> 5 clusters
• Determine optimal parameters for each cluster• Optimization of the 18 segmentation thresholds
Video Understanding: Learning Parameters (B.Georis)
![Page 11: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/11.jpg)
11Video Understanding: Learning Parameters
Camera View
![Page 12: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/12.jpg)
12
Learning Parameters Clustering the Image Histograms
Number of pixels [%]
Pixelintensity [0-255]
X
Z
Y
A X-Z slice represents an image histogram
ßiopt4
ßiopt1
ßiopt2
ßiopt5
ßiopt3
![Page 13: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/13.jpg)
13
• CARETAKER: An FP6 IST European initiative to provide an efficient tool for the management of large multimedia collections.
• Applications to surveillance and safety issues, in urban/environment planning, resource optimization, disabled/elderly person monitoring.
• Currently being validated
on large underground video
recordings ( Torino, Roma).
Complex Events
Raw Data
Simple Events
Knowledge Discovery
Raw data
Primitives Event and Meta data
Audio/Video acquisition and
encoding
Multiple Audio/Video
sensors
Knowledge Discovery
Generic Event recognition
Video Understanding : Knowledge Discovery (E. Corvee, JL. Patino_Vilchis)
![Page 14: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/14.jpg)
14
Event detection examples
![Page 15: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/15.jpg)
15
Data Flow
Object/EventDetection
InformationModelling
Object Detection•Id
•Type
•Info 2D
•Info 3D
Event Detection•Id
•Type (inside_zone, stays_inside_zone)
•Involved Mobile Object
•Involved Contextual Object
Mobile object table
Event table
Contextual object table
![Page 16: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/16.jpg)
16
Mobile Objects
People characterised by:
•Trajectory
•Shape
•Significant Event in which they are involved
•…
Contextual Objects
Find interactions between mobile objects and contextual objects
•Interaction type
•Time
•…
Table Contents
Events
Model the normal activities in the metro station
•Event type
•Involved objects
•Time
•…
![Page 17: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/17.jpg)
17
Knowledge Discovery: trajectory clustering
Objective: Clustering of trajectories into k groups to match people activities
• Feature set• Entry and exit points of an object• Direction, speed, duration, …
• Clustering techniques• Agglomerative Hierarchical Clustering.• K-means• Self-Organizing (Kohonen) Maps
• Evaluation of each cluster set based on Ground-Truth
![Page 18: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/18.jpg)
20
Results on Torino subway (45min), 2052 trajectories
![Page 19: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/19.jpg)
21
SOM K-means Agglomerative
Groups with mixed overlap
Trajectory: Analysis
![Page 20: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/20.jpg)
22
Trajectory: Semantic characterisation
SOM CL14 / Kmeans CL12 Agglomerative CL 21
Consistency of clusters between algorithmsSemantic meaning: walking towards vending machines
![Page 21: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/21.jpg)
23
Intraclass & Interclass variance
• SOM algorithm has the lowest intraclass and higher interclass separation,• Parameter tuning: which clustering techniques?
Trajectory: Analysis
J
v
v
J
vvInterclass
J
vv
Intraclass
jj
ii
i ijij
2
![Page 22: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/22.jpg)
25
Mobile Objects
![Page 23: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/23.jpg)
26
Mobile Object Analysis
0
50
100
150
200
250
time
nb o
f per
sons
ove
r 5
min
Building statistics on Objects
There is an increase of people after 6:45
![Page 24: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/24.jpg)
27
Contextual Object Analysis
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
time
per
cnte
nta
ge
of
use
ove
r 5
min
Vending Machine 2
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
time
per
cnte
nta
ge
of
use
ove
r 5
min
Vending Machine 1
With an increase of people, there is an increase on theuse of vending machines
![Page 25: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/25.jpg)
30
Results : Trajectory Clustering
Cluster 38 Cluster 6 Number of objects 385 15 Object types types: {'Unknown'}
freq: 385 types: {'Person'} freq: 15
Start time (min) [0.1533, 48.4633] [28.09, 46.79] Duration (sec) [0.04, 128.24] [2.04, 75.24] Trajectory types types: {'4' '3' '7'}
freq: [381 1 3] types: {'13' '12' '19'} freq: [13 1 1]
Significant event types: {'void '} freq: 385
types: {'inside_zone_Platform '} freq: 15
![Page 26: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/26.jpg)
31
• Semantic knowledge extracted by the off-line long term analysis of on-line interactions between moving objects and contextual objects:
• 70% of people are coming from north entrance• Most people spend 10 sec in the hall• 64% of people are going directly to the gates without stopping at the ticket
machine• At rush hours people are 40% quicker to buy a ticket, …
• Issues:• At which level(s), should be designed clustering techniques: low level (image
features)/ middle level (trajectories, shapes)/ high level (primitive events)? • to learn what : visual concepts, scenario models? • uncertainty (noise/outliers/rare), what are the activities of interest?• Parameter tuning (e.g. distance, clustering tech.) and • performance evaluation (criteria, ground-truth).
Knowledge Discovery: achievements
![Page 27: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/27.jpg)
32
Video Understanding : Learning Scenario Models (A. Toshev)
or Frequent Composite Event Discovery in Videosevent time series
![Page 28: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/28.jpg)
33
• Why unsupervised model learning in Video Understanding?
• Complex models containing many events,
• Large variety of models,• Different parameters for different
models
The learning of models should be automated.
Learning Scenarios: Motivation
Video surveillance in a parking lot
![Page 29: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/29.jpg)
34
• Input: A set of primitive events from the vision module:object-inside-zone(Vehicle, Entrance) [5,16]
• Output: frequent event patterns.
• A pattern is a set of events:object-inside-zone(Vehicle, Road) [0, 35]object-inside-zone(Vehicle, Parking_Road) [36, 47]object-inside-zone(Vehicle, Parking_Places) [62, 374]object-inside-zone(Person, Road) [314, 344]
Learning Scenarios: Problem Definition
• Goals:• Automatic data-driven modeling of composite events,
• Reoccurring patterns of primitive events correspond to frequent activities,
Find classes with large size & similar patterns.
Zones
![Page 30: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/30.jpg)
35
• Approach:• Iterative method from data mining for efficient frequent patterns discovery in large
datasets,• A PRIORI: Sub-patterns of frequent patterns are also frequent (Agrawal & Srikant,
1995),• At i th step consider only i-patterns which have frequent (i-1) – sub-patterns the search space is thus pruned.
• A PRIORI-property for activities represented as classes:
size(C m-1) ≥ size(C m)
where C m is a class containing patterns of length m, C m-1 is a sub-activity of C m.
Learning Scenarios: A PRIORI Method
![Page 31: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/31.jpg)
36
Learning Scenarios: A PRIORI Method
Merge two i-patterns with (i-1) primitive events in common to form an (i+1)-pattern:
![Page 32: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/32.jpg)
37
2 types of Similarity Measure between event patterns :• similarities between event attributes• similarities between pattern structures
Generic Similarity Measure :• Generic properties when possible easy usage in different domains,• It should incorporate domain-dependent properties relevance to the
concrete application.
Learning Scenarios: Similarity Measure
![Page 33: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/33.jpg)
38
Attributes: the corresponding events in two patterns should have similar (same) attributes (duration, names, object types,...).
Learning Scenarios: Attribute Similarity
• Comparison between corresponding events (same type, same color).
• For numeric attributes: G(x,y)=
• attr(pi, pj) = average of all event attribute similarities.
xy
yx
e
2
![Page 34: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/34.jpg)
39
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
![Page 35: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/35.jpg)
40
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
![Page 36: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/36.jpg)
41
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
![Page 37: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/37.jpg)
42
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
Maneuver Parking!
![Page 38: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/38.jpg)
43
Conclusion:• Application of a data mining approach,• Handling of uncertainty without losing computational effectiveness,• General framework: only a similarity measure and a primitive event library
must be specified.
Future Work:• Other similarities,• Handling of different aspects of uncertainty,• Qualification of the learned patterns,
• Frequent equal interesting ?• Different applications: different event libraries or features.
Learning Scenarios: Conclusion & Future Work
![Page 39: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/39.jpg)
44
GERHOME (CSTB, INRIA, CHU Nice) : Ageing population
http://gerhome.cstb.fr/
Approach :• Multi-sensor analysis based on sensors embedded in the home environment • Detect in real-time any alarming situation• Identify a person profile – his/her usual behaviors - from the global trends of life
parameters, and then to detect any deviation from this profile
HealthCare Monitoring: (N. Zouba)
![Page 40: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/40.jpg)
45
Monitoring of Activities of Daily Living for ElderlyMonitoring of Activities of Daily Living for Elderly
• Goal:Goal: Increase independence and quality of life: Increase independence and quality of life:• Enable elderly to Enable elderly to live longerlive longer in their preferred environment. in their preferred environment.
• Reduce Reduce costscosts for public health systems. for public health systems.
• Relieve Relieve familyfamily members and caregivers. members and caregivers.
• Approach:Approach:• Detecting alarming situations Detecting alarming situations (eg. Falls)(eg. Falls)• Detecting changes in Detecting changes in behaviorbehavior
((missing activities, disorder, interruptions, missing activities, disorder, interruptions, repetitions, inactivityrepetitions, inactivity).).
• Calculate the degree of Calculate the degree of frailtyfrailty of elderly people. of elderly people.
Example of normal activity: Example of normal activity: Meal preparation (in kitchen) (11h– 12h)Meal preparation (in kitchen) (11h– 12h)Eating (in dinning room) (12h -12h30)Eating (in dinning room) (12h -12h30) Resting, TV watching, (in living room) (13h– 16h)Resting, TV watching, (in living room) (13h– 16h) … …
![Page 41: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/41.jpg)
46
• GERHOME (Gerontology at Home) : GERHOME (Gerontology at Home) : homecare laboratory homecare laboratory
http://www-sop.inria.fr/orion/personnel/Francois.Bremond/topicsText/gerhomeProject.html
• Experimental site in CSTB (Centre Scientifique et Technique du Bâtiment) at Sophia Antipolis Experimental site in CSTB (Centre Scientifique et Technique du Bâtiment) at Sophia Antipolis
http://gerhome.cstb.fr
• Partners:Partners: INRIA, CSTB, CHU-Nice, Philips-NXP, CG06… INRIA, CSTB, CHU-Nice, Philips-NXP, CG06…
Gerhome laboratoryGerhome laboratory
![Page 42: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/42.jpg)
47
Gerhome laboratoryGerhome laboratory
Position of the sensors in Gerhome laboratory
• Video camerasVideo cameras installed in the kitchen and in the living-room to detect and track the installed in the kitchen and in the living-room to detect and track the person in the apartment.person in the apartment.
• Contact sensorsContact sensors mounted on many devices to determine the interactions with the person. mounted on many devices to determine the interactions with the person.
• Presence sensorsPresence sensors installed in front of sink and cooking stove to detect the presence of installed in front of sink and cooking stove to detect the presence of people near sink and stove. people near sink and stove.
![Page 43: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/43.jpg)
48
Dans la cuisine
Sensors installed in Gerhome laboratorySensors installed in Gerhome laboratory
Video camera in the living-room
Pressure sensor underneath the legs of armchair
Contact sensor in the windowContact sensor in the cupboard door
![Page 44: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/44.jpg)
49
• We have modelled a set of activities by using a event recognition language We have modelled a set of activities by using a event recognition language developed developed
• in our team. This is an example for “Meal preparation” event.in our team. This is an example for “Meal preparation” event.Composite Event (Prepare_meal_1, “detected by a video camera combined with a contact sensors”
Physical Objects ( (p: Person), (Microwave: Equipment), (Fridge: Equipment), (Kitchen: Zone))
Components ((p_inz: PrimitiveState Inside_zone (p, Kitchen)) “detected by video camera”
(open_fg: PrimitiveEvent Open_Fridge (Fridge)) “detected by contact sensor”
(close_fg: PrimitiveEvent Close_Fridge (Fridge)) “detected by contact sensor”
(open_mw: PrimitiveEvent Open_Microwave (Microwave)) “detected by contact sensor”
(close_mw: PrimitiveEvent Close_Microwave (Microwave))) “detected by contact sensor”
Constraints ((open_fg during p_inz )
(open_mw before_meet open_fg )
(open_fg Duration>= 10)
(open_mw Duration>=5))
Action ( AText (“Person prepares meal”)
AType (“NOT URGENT”)) )
Event modellingEvent modelling
![Page 45: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/45.jpg)
50
Multi-sensor monitoring: results and evaluationMulti-sensor monitoring: results and evaluation
• We have validated and visualized the recognized events with a We have validated and visualized the recognized events with a 3D visualization 3D visualization tool.tool.
ActivityActivity # Videos# Videos # Events# Events TPTP FNFN FPFP PrecisionPrecision SensitivitySensitivity
In the kitchenIn the kitchen 10 45 40 5 0 1 0,888
In the living-roomIn the living-room 10 35 40 0 5 0,888 1
Open microwaveOpen microwave 8 15 15 0 0 1 1
Open fridgeOpen fridge 8 24 24 0 0 1 1
Open cupboardOpen cupboard 8 30 30 0 0 1 1
Preparing meal 1Preparing meal 1 8 3 3 0 0 1 1
• We have We have studied and testedstudied and tested a range of activities in the a range of activities in the GerhomeGerhome laboratory, laboratory, such as: using microwave, using fridge, preparing meal, …such as: using microwave, using fridge, preparing meal, …
![Page 46: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/46.jpg)
51
Recognition of the “Prepare meal” eventRecognition of the “Prepare meal” event
Visualization of a recognized event in the Gerhome laboratory
• The person is recognized with the posture "standing with The person is recognized with the posture "standing with one arm upone arm up”, “located in the ”, “located in the kitchenkitchen” and “using the ” and “using the microwavemicrowave”.”.
![Page 47: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/47.jpg)
52
RecognitionRecognition of the “Resting in living-room” eventof the “Resting in living-room” event
• The person is recognized with the posture “The person is recognized with the posture “sittingsitting in the armchair” and “located in the in the armchair” and “located in the living-living-roomroom”.”.
Visualization of a recognized event in the Gerhome laboratory
![Page 48: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/48.jpg)
53End-usersEnd-users•There are There are several end-usersseveral end-users in homecare: in homecare:
• Doctors (gerontologists):Doctors (gerontologists):• Frailty measurement (depression, …)Frailty measurement (depression, …)• AlarmAlarm detection (falls, gas, dementia, …). detection (falls, gas, dementia, …).
• Caregivers and nursing home:Caregivers and nursing home:• CostCost reduction: no false alarm and reduction employee involvement. reduction: no false alarm and reduction employee involvement.• Employee protection.Employee protection.
• Persons with special needs, including young children, disabled and elderly people: Persons with special needs, including young children, disabled and elderly people: • Feeling safe at home.Feeling safe at home.• AutonomyAutonomy: at night, lighting up the way to bathroom.: at night, lighting up the way to bathroom.• Improving life: smart mirror, summary of user day, week, month, in terms of walking Improving life: smart mirror, summary of user day, week, month, in terms of walking
distance, TV, water consumption.distance, TV, water consumption.
• Family members and relatives: Family members and relatives: • Elderly Elderly safetysafety and protection. and protection.• Social connectivity.Social connectivity.
![Page 49: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/49.jpg)
54Social problems and solutionsSocial problems and solutionsProblemsProblems SolutionsSolutions
Privacy confidentiality and ethics: video (and Privacy confidentiality and ethics: video (and other data) recording, processing and transmission.other data) recording, processing and transmission.
No video recording and transmission, only textual No video recording and transmission, only textual alarms.alarms.
Acceptability for elderlyAcceptability for elderly User empowerment.User empowerment.
UsabilityUsability Easy ergonomic interface (no keyboard, large Easy ergonomic interface (no keyboard, large screen), friendly usage of the system.screen), friendly usage of the system.
Cost effectivenessCost effectiveness The right service for the right price, large variety The right service for the right price, large variety of solutions.of solutions.
Legal issues, no certificationLegal issues, no certification Robustness, benchmarking, on site evaluationRobustness, benchmarking, on site evaluation
Installation, maintenance, training, interoperability Installation, maintenance, training, interoperability with other home deviceswith other home devices
Adaptability, X-Box integration, wireless, Adaptability, X-Box integration, wireless, standards (OSGI, …)standards (OSGI, …)
Research financing Research financing ? France (no money, lobbies), Europe (delay), US, ? France (no money, lobbies), Europe (delay), US, Asia.Asia.
![Page 50: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/50.jpg)
55
Conclusion
A global framework for building video understanding systems:
• Hypotheses: • mostly fixed cameras• 3D model of the empty scene• predefined behavior models
• Results: • Video understanding real-time systems for Individuals, Groups of People, Vehicles, Crowd,
or Animals …• Knowledge structured within the different abstraction levels (i.e. processing worlds)
• Formal description of the empty scene• Structures for algorithm parameters• Structures for object detection rules, tracking rules, fusion rules, … • Operational language for event recognition (more than 60 states and events), video
event ontology• Tools for knowledge management
• Metrics, tools for performance evaluation, learning• Parsers, Formats for data exchange• …
![Page 51: Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning](https://reader036.vdocuments.net/reader036/viewer/2022070416/5681505d550346895dbe5e95/html5/thumbnails/51.jpg)
56
Object and video event detection• Finer human shape description: gesture models • Video analysis robustness: reliability computation
Knowledge Acquisition• Design of learning techniques to complement a priori knowledge:
• visual concept learning• scenario model learning
System Reusability• Use of program supervision techniques: dynamic configuration of programs and
parameters • Scaling issue: managing large network of heterogeneous sensors (cameras,
microphones, optical cells, radars….)
Conclusion: perspectives