smart surveillance system - chinese university of hong...

Smart Surveillance System

Tai-Pang Wu

Principal Researcher

Enterprise & Consumer Electronics Group

ASTRI

Fact

We have too many videos and last too long ◦ ~500,000 in London, ~4,200,000 in UK

◦ ~1,000,000 cameras deployed in SZ

◦ Many surveillance videos are captured in 24-hour-per-day and 7-day-per-week manner

Current Situation (excerpt from Wikipedia, CCTV) ◦ There is little evidence that CCTV deters crime; in fact, there is considerable

evidence that it does not. According to a Liberal Democrat analysis, in London “Police are no more likely to catch offenders in areas with hundreds of cameras than in those with hardly any.” A 2008 Report by UK Police Chiefs concluded that only 3% of crimes where solved by CCTV. In London, a Metropolitan Police report showed that in 2008, only one crime was solved per 1000 cameras.

◦ Full text here: http://en.wikipedia.org/wiki/Closed-circuit_television

2

http://en.wikipedia.org/wiki/Closed-circuit_television



An Example

3

This video is playing (The length of the video – 2:06)

The first subject will appear at 0:70

Challenges

Most of the videos are not monitored

◦ In many places, polices are still using human eyes,

together with fast-forward, to chase suspects/targets

◦ Not enough human labor

Difficult to search from the “sea of video” even

we know something happened

Solution?

◦ Find out cues from massive videos

4

A Demanded System

How to improve?

◦ We want short videos

Where all the useful contents should be preserved

◦ We need something to replace the human tasks or tedious

labor works

Helps us to find the suspects/targets

Chases the suspects in cross camera views

Analyzes the videos and extracts the respective useful information to

improve decision making

5

video video

Our Solution: Smart Surveillance System

Existing Techniques

◦ Scattered over different research topics which focus

on different components

E.g., Tracking, Pedestrian Detection, Fact Detection, Re-identification,

Trajectory Grouping, Camera Topology Estimation, etc.

Our solution

◦ The synergy of selected cutting-edge technologies

which focuses on the mentioned demands (previous

slide)

6

Our Solution: Smart Surveillance System

Solution

◦ Video Summarization Reduce the length of a video

Decompose the video into components and save as a database for analysis

◦ Object and Face Detection Detecting the type of object

Face recognition and verification

Identification

◦ Re-identification Finding the related video that the same object appear

Analyze the path and activity of the objects

◦ Clustering/Grouping and Searching Classification

Query

7

Video Summarization

Objective

◦ Shorten the video

◦ Retain the content

Side product

◦ ‘Primitives’ called Tubes

How?

◦ Derived from [Pritch et. al. TPAMI 2008]

◦ But we are using a different algorithm

8

video video

Video Summarization: An Example

9

Original Video (2:07) Video Summary (0:16)

Video Summarization: How

The whole process could be done in

online/offline mode Offline mode: the videos are captured in advance

Online mode: video streams are input directly

10

Background

Modeling

Foreground

Extraction

Tube

Extraction

Tube

Reordering

Video Summary

Generation

Background Modeling

We have certain background modeling methods for different situations

◦ Running Average

◦ GMM [Stauffer and Grimson, CVPR’98]

◦ Adaptive GMM [Zivkovic and Heijden, PRL’06]

◦ Adaptive Median [McFarlane and Schofield, MVA’95]

◦ SILTP [Liao et. al., CVPR’10]

◦ Reliable Background Suppression for Complex Scenes

[Calderara et. al, VSSN’06]

◦ Foreground Object Detection from Videos Containing Complex Background

[Li et. al., ACM MM’03]

11

Background Model

Background Modeling

The followings three are selected ◦ Indoor scene: SILTP

◦ Outdoor natural scene: Adaptive GMM

◦ Dynamic background: Adaptive GMM / ACM MM’03

12

Foreground Extraction

Background Cut [Sun et. al. ECCV’06]

13

Background Model

Tube Extraction

Stacking the extracted foreground together

The same object / event will be considered as a

single tube

14

Tube Reordering

Re-arrange the tubes along the temporal axis where the size of empty

space is minimized

Different from [Pritch et. al. TPAMI 2008] which uses pixel-wise

optimization

Our method re-arrange the tubes by using polygon collision detection,

which is much faster

15

Video Summary Generation

Stitch the reordered tubes back to the

estimated background to produce the video

summary

16

Background Model

Video Summary Generation

Stitch the reordered tubes back to the

estimated background to produce the video

summary

17

The Side Product

A database of background models and tubes

from different videos

18

A database

Ready for analysis

Background Model

Tracking Clustering Tube / BG

Database

This is the database

which contains all

the extracted tubes,

extracted

backgrounds models

and original videos.

A kind of raw data

for analysis.

It is responsible for

object tracking. At

this stage, we could

have no (or

limited) information

about the type/class

of the object being

tracked

Feature

Extraction

It is designed to

extract the high

level features from

the objects

obtained from the

previous stage

Classification

It is used to classify

the objects into

three main types: i)

human, ii) face and

iii) vehicle.

Cluster the objects

in some convenient

form (using some

level feature) where

mining or searching

could be easily and

efficiently done

Final

Database

This is the final

database that all the

necessary

information has

been extracted,

clustered and

analyzed. It is ready

for query

Second Phase

Feature Extraction

Pedestrian Detection ◦ “Histogram of Oriented Gradients (HOG) for Human Detection” [Dalal

and Triggs, CVPR’05]

◦ Speed up searching by utilizing the geometry of the environment which is

obtained from “Single View Metrology” [Criminisi et.al., IJCV’00]

20

Human HOG

Descriptor

Positive

Components

Negative

Components

The ground plane is known

The range of human height is known.

Feature Extraction

Shape and Appearance Modeling ◦ “Shape and Appearance Context Modeling”

[Wang et. al., ICCV’07]

◦ Compute the similarity between deformable objects of a given class

◦ For person re-identification

We can search a person from a network of cameras

21

View 1 View 2 Shape

Label

Appearance

Label

Shape

Label

Appearance

Label

Feature Extraction

Face Recognition ◦ “Face Recognition with Learning-based Descriptor”

[Cao et. al., CVPR’10]

◦ A learning based approach

Learning-based encoding and histogram representation

Pose-adaptive matching

22

Feature Extraction

23

Feature Extraction

Vehicle Detection ◦ Possible work

“Real-time Recognition and Re-identification of Vehicles from Video Data with high Re-identification

Rate” [Woesler SCI’04]

◦ Using 3D proxies/models to detect vehicles

Classification is also done

◦ License plate and container detection

24 The 3D models used for vehicle detection. Region of interest where vehicle is searched

Clustering and Retrieval

Detailed Features

◦ Human Trajectory / moving direction

Size

Dressing (color/texture/logo)

Activity

◦ Vehicle Trajectory / moving direction

Type

Speed

Dominated color

◦ Face Normal / Abnormal (e.g., sheltered)

Expression

Gender

25

Clustering and Retrieval

Retrieval video contents by inputting texts,

images or sketches ◦ E.g., “Object-based Surveillance Video Retrieval System with Real-Time Indexing

Methodology” [Yuk et. Al., ICIAR’07]

26

Possible Services

A complete system allows us to ◦ Minimize the time to watch surveillance videos

◦ Index the original video based on the video summary

◦ Find a person by inputting a face image

◦ Find the associated videos where the subject appeared

◦ Extract different target objects based on interested information (e.g.,

moving direction, dressing, etc.)

27

Possible Services

Some Examples:

28

Add Video Summarization to

existing software

Monitor traffic

Detect abnormal activities

Record vehicle information

Monitor entrance

Record human face

smart surveillance system - chinese university of hong...

Documents