seminar multimodale räume ss 2003 – einführung 7. mai - rainer stiefelhagen multimodale räume...

60
Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume “Smart Rooms” “Intelligent Environments” Seminar SS 03

Upload: wotan-karren

Post on 06-Apr-2015

105 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Multimodale Räume

“Smart Rooms”“Intelligent

Environments”

Seminar SS 03

Page 2: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

User Interfaces

• In the beginning: Wimpy Computing– Windows, Icons, Menus, Pointing

Page 3: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

2nd Generation:Human-Machine Interaction

“Please show me… hm… all Hotels in THIS area.. er..partof the city"

• Speaking• Pointing, • Gesturing• Hand-Writing• Drawing• Presence/Focus of Attention• Combination

– Sp+HndWrtg+Gestr.

– Repair

• Multimodal NLP & Dialog

Page 4: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

“Perceptual” User Interfaces

• Perceptive– human-like perceptual capabilities (what is the user

saying, who is the user, where is the user, what is he doing?)

• Multimodal – People use multiple modalities to communicate (speech,

gestures, facial expressions, …)

• Multimedia– Text, graphics, audio and video

(Matthew Turk (Ed.), Proceedings of the 1998 Workshop on Perceptual User Interfaces)

Page 5: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Next: Pervasive Computing

Human-Computer Interaction not the Only Exchange

Humans Want to Interact with Other Humans

– Computers in the Human Interaction Loop (CHIL)

– The Transparent, Invisible Computer

– Computers Needs to be Context Aware

– Should Require little or no Learning or Attention

– Should be proactive rather than command driven

– Produce Little or No Distraction

– Permit a HCI and CHIL Mix

Page 6: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Smart/Intelligent Rooms

• Use of computation to enhance everyday activity• Integrate computers seamlessly into the real world

(e.g. offices, homes)• Use “natural” interfaces for communication

(voice, gesture, etc. ) • Computer should adapt to the human, not vice-

versa!

Page 7: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Perception

• In order to respond appropriately, objects/room need(s) to pay attention to – People and – Context

• Machines have to be aware of their environment:– Who, What, When, Where and Why?

• Interfaces must be adaptive to – Overall situation – Individual User

Page 8: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Intelligent Environments

• Classroom 2000 (Georgia Tech)• Mozer’s Adaptive House• Enhanced Meeting Rooms• Kids Room (MIT)• …• Enhanced Objects such as Whiteboards, Desks, Chairs, …• See also the Intelligent Environments Resource Page

(http://www.research.microsoft.com/ierp/)

Page 9: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Intelligent Rooms, Univ. California, San Diego

Page 10: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Classroom 2000

• Capturing activity in a classroom– Speaker’s voice– Video– Slides– Handwritten Notes

Page 11: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Classroom 2000

Presenting (recorded) lectures through a web-based interface

•Integration of Slides, Notes, Audio, Video

• Searching

•Adding additional material

Page 12: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Microsoft Easy Living Project

•XML-based distributed agent system

•Computer vision for person-tracking and visual user interaction.

•Multiple sensor modalities combined.

•Use of a geometric model of the world to provide context.

•Automatic or semi-automatic sensor calibration and model building.

•Fine-grained events and adaptation of the user interface. Device-independent communication and data protocols. Ability to extend the system in many ways.

Page 13: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Mozer’s Adaptive House

• Operated as an ordinary home– Usual light-switches, thermostats, doors etc.

• Adjustments are measured and used to train the house to– automatically adjust temperature– adjust lighting– choose music or TV channel

• The house infers the users desires from their actions and behaviours

Page 14: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Adaptive House (Mozer)

Sensors:

• Light Level

• Sound Level

• Temperature

• Motion

• Door status

• Window status

• Light settings

• Fan

• Heaters

• …(M. Mozer, Univ. of Colorado, Boulder)

Page 15: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Issues in Perception

• Visual– Face-detection / Tracking– Body-Tracking– Face Recognition– Gesture Recognition– Action Recognition– Gaze Tracking / Tracking Focus of Attention

• Auditory– Speech Recognition– Speaker Tracking– Auditory Scene Analysis– Speaker Identification

• Other: Haptic, Olfactoric, … ?

Page 16: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Enhanced Meeting Rooms

Capturing of Meetings

• Transcription• Summarization• Dialog Processing

• Who was there ?• Who talked to whom ?

Page 17: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Work at ISL

• Face Tracking• Facial Feature Tracking (Eyes, Nose, Mouth)• Head Pose Estimation / Gaze Tracking• Lip-Reading (Audio-Visual Speech Reco.)• 3D Person Tracking• Pointing Gesture Tracking

• Other Modalities: Speech (!!!, see John), Dialogue, Translation, Handwriting, ...

Page 18: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Tracking of Human Faces

• A face provides different functions:• identification• perception of emotional expressions

• Human Computer Interaction requires tracking of faces:• lip-reading• eye/gaze tracking• facial action analysis / synthesis

• Video Conferencing / video telephony application:• tracking the speaker• achieving low bit rate transmission

Page 19: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Demo: FaceTracker

Page 20: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Color Based Face Tracking

Human skin-colors:• cluster in a small area of a color space• skin-colors of different people mainly differ in intensity!• variance can be reduced by color normalization• distribution can be characterized by a Gaussian model

BGR

Rr

BGR

Gg

Chromatic colors:

Page 21: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Color Model

Advantages:• very fast• orientation invariant• stable object representation• not person-dependent• model parameters can be quickly adapted

Disadvantages:• environment dependent • (light-sources heavily affect color distribution)

Page 22: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Tracking Gaze and Focus of Attention

• In meetings:– to determine the addressee of a speech act – to track the participants attention– to analyse, who was in the center of focus– for meeting indexing / retrieval

• Interactive rooms – to guide the environments focus to the right application– to suppress unwanted responses

• Virtual collaborative workspaces (CSCW)• Human-Robot Cooperation • Cars (Driver monitoring)

Page 23: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Tracking a User’s Focus of Attention

• Focus of Attention tracking:– To detect a person’s interest

– To know what a user is interacting with

– To understand his actions/intentions

– To know whether a user is aware of something

• In meetings:– to determine the addressee of a speech act

– to understand the dynamics of interaction

– for meeting indexing / retrieval

• Other areas– Smart environments

– Video-conferencing

– Human-Robot Interaction

Page 24: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Head Pose Estimation

• Model-based approaches:– Locate and track a number of facial features– Compute head pose from 2D to 3D correspondences (Gee

& Cipolla '94, Stiefelhagen et.al '96, Jebara & Pentland '97,Toyama '98)

• Example-based approaches:– estimate new pose with function approximator (such as

ANN) (Beymer et.al.'94, Schiele & Waibel '95, Rae & Ritter '98)

– use face database to encode images (Pentland et.al. '94)

Page 25: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Model-based Head Pose estimation

Image 3D Model Real World

Y

Z

X

Feature Tracking Pose Estimation

•Find correspondences between points in a 3D model and points in the image

• Iteratively solve linear equation system to find pose parameters (rx, ry, rz, tx, ty, tz)

Page 26: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Demo: Facial Feature Tracking

Page 27: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Demo: Model-based Head Pose

Page 28: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Model-based Head Pose

• Pose estimation accuracy depends on correct feature localization!

• Problems:– Choice of good features– Occlusion due to strong head rotation– Fast head movement– Detection of tracking failure / re-initialization– Requires good image resolution

• Video

Page 29: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Estimating Head Pose with ANNs

• Train neural network to estimate head orientation

• Preprocessed image of the face used as input

Page 30: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Network Architecture

Hidden Layer:40 to 150 units

Pan (Tilt)

Input Retina: up to 3 x 20x30 pixel 1.800 units

Page 31: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Tracking People in a Panoramic View

Camera View Panoramic View

PerspectiveView

Page 32: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Training

• Separate nets for pan and tilt• Trained with Std.-Backprop with Momentum Term

• Datasets:– Training on 6100 images from 12 users– Crossevaluation on 750 images from same users– Tested on 750 images from same users

• Additional User Independent Testset:– 1500 images from two new users

Page 33: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Results

training set test set new usershisto 6.6 / 5.0 9.4 / 6.9 11.3 / 9.1edges 6.0 / 2.6 10.8 / 7.1 13.3 / 10.8both 1.4 / 1.5 7.8 / 5.4 9.9 / 10.3

Average Error in degrees for pan / tilt

histo: Histogram-normalized image used as input

edges: Horizontal- and Vertical Edge Image used as input

both: Both, Histogram-image plus Edge Images used

Page 34: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Demo

Page 35: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Spatial-Awareness in Smart Rooms

Tracking people indoors

• To focus sensors on people

• To resolve spatial lrelationships

• To avoid bumping into humans• To analyze activity

Page 36: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Person Tracking

Vision based localization of people/objects:

• Single Perspective:• Pfinder - W3S - Hydra - etc.

•Multiple Perspective:• AVIARY - Easy Living

Page 37: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Person Tracking in the ISL Smart Room

Cam2

Cam1 Cam0

Features

FeaturesTrackingagent

Featureextractor

People

Cam3

Page 38: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Personen-Tracking mit mehreren Kameras

Ziel: 3D Tracking von Personen in Räumen

• Segmentierung von Vordergrundobjekten in jedem Bild

• „3D Schnitt“ der Strahlen durch die Objektmitten

• Kalman-Filter

Page 39: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Adaptive Silhouette Extraction

Background subtraction:

• Adaptive Multi-Gaussian background model [Stauffer et al., CVPR 1998]

• Morphological operators smooth foreground output

• Connected components form silhouettes

Page 40: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Locating people

LocationHypotheses:

i) (X,Y)ii) (X,Y)

12

3

a

b

a

ab

b

• Extract reference point: Centroid

• Use calibrated sensors to calculate absolute position

• Create list of location hypotheses 1

Page 41: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Tracking people

Best Hypothesis Tracking:

• Match location hypotheses ato tracks

• Smooth tracks with Kalman afilter

Hypotheses

i) (X,Y)ii) (X,Y)

Track 1

Track 2

i)

ii)

Track 1

Track 2

Page 42: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Tracking Problems

Imperfect andMerged silhouettes:

Counterstrategies

Better Vision algorithm

Probabilistic Multi-Hypothesis aTracking

Reference point: Head

Page 43: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

• Use head as reference point instead of centroid

• Head tracker has significantly lower tracking error and false alarm rate

Reference point: Head

HeadCentroid

0

0,01

0,02

0,03

0,04

0,05

0,06

0,07

0,08

0,09

0,1

Tracking error False alarm rate

Page 44: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Demo

Page 45: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Erkennung von Zeigegesten

• Ziele: – Menschliche Zeigegesten erkennen – Zeigerichtung in 3D extrahieren

• Einsatzgebiete:– Mensch-Roboter-Interaktion – smart rooms

• Anforderungen:– Personenunabhängig– Echtzeitbetrieb– Kamerabewegung möglich

Page 46: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Erkennung von Zeigegesten

Stereokamera Linkes/rechtes Bild

Page 47: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

3D-Tracker: Verarbeitungsschritte

Kamera Hautfarbe Disparität

3D-Clustering von Hautfarbpixeln liefert Hinweise auf Position von Kopf und Hände.

Page 48: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Gestenerkennung: Bewegungsphasen

• Zeigegesten bestehen aus drei intuitiv unterscheidbaren Bewegungsphasen:

– Beginn – Halten– Ende

• Genaue Lokalisierung der Haltephase wichtig zur Bestimmung der Zeigerichtung

Mittlere Dauer der Bewegungsphasen

μ [sec] σ [sec]

Komplette Geste 1.75 0.48

Beginn 0.52 0.17

Halten 0.76 0.40

Ende 0.47 0.12

Page 49: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Gestenerkennung: Modelle

• Modellierung der 3 Phasen mit separaten Modellen

• Kontinuierliche HMMs mit 2 Gaussians pro Zustand

• Null-Modell als Schwellwert für die Phasen-Modelle

• Training auf handgelabelten Daten

Page 50: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Gestenerkennung: Detektion

• Eine Zeigegeste wird erkannt, wenn 3 Zeitpunkte tB < tH < tE gefunden werden, so dass– PE(tE) > PB(tE) und PE(tE) > 0

– PB(tB) > PE(tB) und PB(tB) > 0

– PH(tH) > 0

Page 51: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Gestenerkennung: Merkmale

• Merkmalsvektor: (r, Δθ, Δy )

• Experimente: zylindrische Koordinaten besser als sphärische und kartesische

• Hand relativ zum Kopf unabhängig von Position im Raum

• Δθ, Δy keine Anpassung an Zeigeziele aus dem Training

• Spline-Interpolation der Merkmals-sequenzen auf konstant 40Hz.

Page 52: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Zeigerichtung

• Kopf-Hand-Linie– Sehstrahl Auge-Hand

– Einfach zu messen

• Unterarmlinie– Potenziell überlegen bei

abgewinkeltem Arm

– Schwieriger zu messen

Page 53: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Audio-Visual Speech Recognition

Page 54: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Lip Tracking Module

• Feature based

• detects localization failures and automatic recover from failures

• tracks facial features (pupils, nostrils, lips)

Page 55: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Audio-Visual Recognition

hypc = a hypa v hypv

1 = a v

Kombinations Methoden

• SNR Gewichte

• Entropie Gewichte

• trainierte Gewichte

Page 56: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Fusion Levels

• Word Level (Vote, Decide based on A and V score)• Phoneme Level (Combine by Diff. Weighting

Schemes)• Feature Level (Combine Features)

Page 57: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Audio-Visual Speech

0102030405060708090

100

clean 16 dBSNR

8 dBSNR

acoustic

visual

combined

Page 58: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Mögliche Themen

• Personentracking• Gestenerkennung• Attentive Interfaces• Face Detection• Lippenlesen (Audio-Visual

Speech Reco.)• Audio-Visual Tracking• Emotion Recognition• Person Identification• Microphone-Arrays• Sensor Fusion

• Smart Room Infrastructure • Intelligent Camera Control• Self-Calibration • Other Smart Room Projects

(MIT, Georgia Tech, IM2)• Other Sensors: Pressure, IR,

etc• Speech Recognition

– in Meetings– Far-Field – Efficient

• Microphone-Arrays

Page 59: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen

Page 60: Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen Multimodale Räume Smart Rooms Intelligent Environments Seminar SS 03

Seminar Multimodale Räume SS 2003 – Einführung 7. Mai - Rainer Stiefelhagen