video analysis
DESCRIPTION
Video Analysis. Mei-Chen Yeh May 29, 2012. Outline. Video representation Motion Actions in Video. Videos. A natural video stream is continuous in both spatial and temporal domains. A digital video stream sample pixels in both domains. Video processing. YC b C r. YC b C r. - PowerPoint PPT PresentationTRANSCRIPT
Video Analysis
Mei-Chen YehMay 29, 2012
Outline
• Video representation• Motion• Actions in Video
Videos
• A natural video stream is continuous in both spatial and temporal domains.
• A digital video stream sample pixels in both domains.
Video processing
YCbCr
YCbCr
Video signal representation (1)
• Composite color signal– R, G, B– Y, Cb, Cr
• Why Y, Cb, Cr?– Backward compatibility (back-
and-white to color TV)– The eye is less sensitive to
changes of Cb and Cr components
YRCYBC
BGRY
r
b
114.0587.0299.0
Luminance(Y)
Chrominance(Cb + Cr)
Video signal representation (2)
• Y is the luma component and Cb and Cr are the blue and red chroma components.
Y Cb Cr
Sampling formats (1)
4:4:4 4:2:2 (DVB) 4:1:1 (DV)
Slide from Dr. Ding
Sampling formats (2)
4:2:0 (VCD, DVD)
TV encoding system (1)
• PAL– Phase Alternating Line, is a color encoding system used in
broadcast television systems in large parts of the world.• SECAM
– (French: Séquential Couleur Avec Mémoire), is an analog color television system first used in France.
• NTSC– National Television System Committee, is the analog
television system used in most of North America, South America, Burma, South Korea, Taiwan, Japan, Philippines, and some Pacific island nations and territories.
TV encoding system (2)
Uncompressed bitrate of videos
Video Type
Pixels per Frames
Image Aspect Ratio
Frames per Second
Bits/pixel Uncompressed Bitrate
NTSC 480 483 4:3 29.97 16 111.2 Mb/s
PAL 576 576 4:3 25 16 132.7 Mb/s
CIF 352 288 4:3 14.98 12 18.2 Mb/s
QCIF 176 144 4:3 9.99 12 3.0 Mb/s
HDTV 1280 720 16:9 59.94 12 622.9 Mb/s
HDTV 1920 1080 16:9 29.97 12 754.7 Mb/s
Slide from Dr. Chang
Outline
• Video representation• Motion• Actions in Video
Motion and perceptual organization
• Sometimes, motion is foremost cue
Motion and perceptual organization
• Even poor motion data can evoke a strong percept
Motion and perceptual organization
• Even poor motion data can evoke a strong percept
Uses of motion
• Estimating 3D structure• Segmenting objects based on motion cues• Learning dynamical models• Recognizing events and activities• Improving video quality (motion stabilization)• Compressing videos• ……
Motion field
• The motion field is the projection of the 3D scene motion into the image
Motion field• P(t) is a moving 3D point• Velocity of scene point:
• V = dP/dt• p(t) = (x(t),y(t)) is the
projection of P in the image• Apparent velocity v in the
image: • vx = dx/dt• vy = dy/dt
• These components are known as the motion field of the image
p(t)
p(t+dt)
P(t)P(t+dt)
V
v
Motion estimation techniques• Based on temporal changes in image intensities• Direct methods
– Directly recover image motion at each pixel from spatio-temporal image brightness variations
– Dense motion fields, but sensitive to appearance variations– Suitable when image motion is small
• Feature-based methods– Extract visual features (corners, textured areas) and track them
over multiple frames– Sparse motion fields, but more robust tracking– Suitable when image motion is large
Optical flow
• The velocity of observed 2-D motion vectors• Can be caused by– object motions– camera movements– illumination condition changes
Optical flow the true motion field
Motion field exists but no optical flow No motion field but shading changes
Problem definition: optical flow
How to estimate pixel motion from image I(x,y,t) to image I(x,y,t+dt)?• Solve pixel correspondence problem
– given a pixel in It, look for nearby pixels of the same color in It+dt
Key assumptions• color constancy: a point in It looks the same in It+dt
– For grayscale images, this is brightness constancy• small motion: points do not move very far
This is called the optical flow problem.
),,( tyxI ),,( tdtyxI
Optical flow constraints (grayscale images)
Let’s look at these constraints more closely:• brightness constancy:
• small motion: (u and v are small)– using Taylor’s expansion
),,( tyxI ),,( tdtyxI
),,(),,( tyxIdtvyuxI t
tt dtIv
yIu
xItyxIdtvyuxI
),,(),,(= 0
Optical flow equation
• Combining these two equations
• Dividing both sides by dt
0
tdtIv
yIu
xI
0,
tIvvI yx
u, v: displacement vectors
Known as the optical flow equation
T
yI
xII
,
velocity vectorspatial gradient vector
• Q: how many unknowns and equations per pixel?– 2 unknowns, one equation
• What does this constraint mean?• The component of the flow perpendicular to the
gradient (i.e., parallel to the edge) is unknown
0)','( vuI
edge
(vx, vy)
(u’,v’)
gradient
(vx +u’, vy +v’)
If (vx, vy) satisfies the equation, so does (vx +u’, vy +v’) if
0,
tIvvI yx
• Q: how many unknowns and equations per pixel?– 2 unknowns, one equation
• What does this constraint mean?• The component of the flow perpendicular to the
gradient (i.e., parallel to the edge) is unknown
This explains the Barber Pole illusion
0,
tIvvI yx
1
2
The aperture problem
Perceived motion
The aperture problem
Actual motion
The barber pole illusion
http://en.wikipedia.org/wiki/Barberpole_illusion
The barber pole illusion
http://en.wikipedia.org/wiki/Barberpole_illusion
To solve the aperture problem…
• We need more equations for a pixel.• Example– Spatial coherence constraint: pretends the pixel’s
neighbors have the same (vx, vy)– Lucas & Kanade (1981)
Outline
• Video representation• Motion• Actions in Video– Background subtraction– Recognition of actions based on motion patterns
Using optical flow:recognizing facial expressions
Recognizing Human Facial Expression (1994)by Yaser Yacoob, Larry S. Davis
Example use of optical flow: visual effects in films
http://www.fxguide.com/article333.html
Slide credit: Birgi Tamersoy
Background subtraction
• Simple techniques can do ok with static camera• …But hard to do perfectly
• Widely used:– Traffic monitoring (counting vehicles, detecting &
tracking vehicles, pedestrians),– Human action recognition (run, walk, jump, squat),– Human-computer interaction– Object tracking
Slide credit: Birgi Tamersoy
Slide credit: Birgi Tamersoy
Slide credit: Birgi Tamersoy
Slide credit: Birgi Tamersoy
Frame differencesvs. background subtraction
• Toyama et al. 1999
Slide credit: Birgi Tamersoy
Pros and consAdvantages:• Extremely easy to implement and use• Fast• Background models need not be constant, they change over time
Disadvantages:• Accuracy of frame differencing depends on object speed and
frame rate• Median background model: relatively high memory requirements• Setting global threshold Th…
Slide credit: Birgi Tamersoy
Background subtraction with depth
How can we select foreground pixels based on depth information? Leap: http://www.leapmotion.com/
Outline
• Video representation• Motion• Actions in video– Background subtraction– Recognition of action based on motion patterns
Motion analysis in video
• “Actions”: atomic motion patterns -- often gesture-like, single clear-cut trajectory, single nameable behavior (e.g., sit, wave arms)
• “Activity”: series or composition of actions (e.g., interactions between people)
• “Event”: combination of activities or actions (e.g., a football game, a traffic accident)
Modified from Venu Govindaraju
Surveillance
http://users.isr.ist.utl.pt/~etienne/mypubs/Auvinetal06PETS.pdf
• Model-based action/activity recognition:– Use human body tracking and pose estimation
techniques, relate to action descriptions– Major challenge: accurate tracks in spite of occlusion,
ambiguity, low resolution• Activity as motion, space-time appearance
patterns– Describe overall patterns, but no explicit body tracking– Typically learn a classifier– We’ll look at a specific instance…
Human activity in video:basic approaches
• Recognize actions at a distance [ICCV 2003]– Low resolution, noisy data, not going to be able to track
each limb.– Moving camera, occlusions– Wide range of actions (including non-periodic)
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
The 30-Pixel Man
Approach
• Motion-based approach– Non-parametric; use large amount of data– Classify a novel motion by finding the most similar
motion from the training set• More specifically,– A motion description based on optical flow– an associated similarity measure used in a nearest
neighbor framework
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
Motion description matching results
More matching results: video 1, video 2
Action classification result
• demo video
Gathering action data
• Tracking – Simple correlation-based tracker– User-initialized
Figure-centric representation
• Stabilized spatio-temporal volume– No translation information– All motion caused by person’s
limbs, indifferent to camera motion!
Extract optical flow to describe the region’s motion.
Using optical flow:action recognition at a distance
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
InputSequence
Matched Frames
Use nearest neighbor classifier to name the actions occurring in new video frames.
Using optical flow:action recognition at a distance
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
Football Actions: classification
[.67 .58 .68 .79 .59 .68 .58 .66](8 actions, 4500 frames, taken from 72 tracked sequences)
Application: motion retargeting
[Efros, Berg, Mori, & Malik 2003]http://graphics.cs.cmu.edu/people/efros/research/action/
SHOW VIDEO
Summary
• Background subtraction: – Essential low-level processing tool to segment
moving objects from static camera’s video• Action recognition: – Increasing attention to actions as motion and
appearance patterns– For constrained environments, relatively simple
techniques allow effective gesture or action recognition
Closing remarks
• Thank you all for your attention and participation to the class!
• Please be well prepared for the final project (06/12 and 06/19). Come to class on time. Start early!