xiuwen liu department of computer science florida state university

Research Activities at Center for Applied Vision and Imaging Sciences and

Florida State Vision GroupFlorida State University

Xiuwen Liu

Department of Computer Science

Florida State University

http://cavis.fsu.edu & http://fsvision.fsu.edu

Research Statement

My research goal is to create machines that can “see” with similar human performance

• This seems a trivial problem as each of us can do this without any effort

• Computer + Camera = “A See Machine” ?

Visual Pathway

Visual Illusion

Outline

Motivations• Some applications of computer vision and pattern

recognition techniques

Some of the research projects

Related Courses

Contact information

Computer Vision Applications

No hands across America• Sponsored by Delco Electronics, AssistWare

Technology, and Carnegie Mellon University

• Navlab 5 drove from Pittsburgh, PA to San Diego, CA, using the RALPH computer program.

• The trip was 2849 miles of which 2797 miles were driven automatically with no hands

– Which is 98.2%

http://cart.frc.ri.cmu.edu/users/hpm/project.archive/reference.file/ralph.html

Computer Vision Applications – continued

Human-Computer Interactions

Sign Language Recognition

CyberKnife

CyberKnife – Cont.

Image-Guided Neurosurgery

Intelligent Transportation Systems

http://dfwtraffic.dot.state.tx.us/dal-cam-nf.asp

http://dfwtraffic.dot.state.tx.us/dal-cam-nf.asp

Computer Vision Applications – cont.

Military applications• Automated target recognition

Computer Vision Applications – continued

Biometrics – cont.

Iris code can achieve zero false acceptance

Computer Vision in Sports

How was the yellow created?

Generic Image Modeling

How can we characterize all these images perceptually?

Spectral Histogram Representation

Spectral histogram

• Given a bank of filters F(), = 1, …, K, a spectral histogram is defined as the marginal distribution of filter responses

)I(*)(I )()( vFv

v

vIzδzH ))((|I|

1)( )()(

I

),,,( )(I

)2(I

)1(II

KHHHH

Spectral Histogram Representation - continued

Choice of filters • Laplacian of Gaussian filters

• Gabor filters

• Gradient filters

• Intensity filter

LoG filter Gabor filter

Spectral Histogram Representation - continued

Texture Synthesis Examples - continued

An image with periodic structures

Observed image Synthesized image

Object Synthesis Examples - continued

Performance Comparison

Face Detection Based On Spectral Representations

Face detection is to detect all instances of faces in a given image

Each image window is represented by its spectral histogram• A support vector machine is trained on training faces

• Then the trained support vector machine is used to classify each image window in an input image

More results at http://fsvision.fsu.edu/face-detection

http://fsvision.fsu.edu/face-detection

Face detection - continued

Rotation Invariant Face Detection

Rotation Invariant Face Detection - continued

Linear Representations

Linear representations are widely used in appearance-based object recognition and other applications

• Simple to implement and analyze

• Efficient to compute

• Effective for many applications

dT RIUUI ),(

Standard Linear Representations

Principal Component Analysis• Designed to minimize the reconstruction error on the training set

• Obtained by calculating eigenvectors of the co-variance matrix

Fisher Discriminant Analysis• Designed to maximize the separation between means of each class

• Obtained by solving a generalized eigen problem

Independent Component Analysis• Designed to maximize the statistical independence among coefficients

along different directions

• Obtained by solving an optimization problem with some object function such as mutual information, negentropy, ....

Standard Linear Representations - continued

Standard linear representations are sub optimal for recognition applications• Evidence in the literature

• A toy example– Standard representations give the worst recognition performance

Optimal component analysis

Performance Measure - continued

Suppose there are C classes to be recognized• Each class has ktrain training images

• It has kcross cross validation images

• We used h(x) = 1/(1+exp(-2x)

Performance Measure - continued

F(U) depends on the span of U but is invariant to change of basis• In other words, F(U)=F(UO) for any orthonormal matrix O

• The search space of F(U) is the set of all the subspaces, which is known as the Grassmann manifold

– It is not a flat vector space and gradient flow must take the underlying geometry of the manifold into account

Deterministic Gradient Flow - continued

Gradient at [J] (first d columns of n x n identity matrix)

Deterministic Gradient Flow - continued

Gradient at U: Compute Q such that QU=J

Deterministic gradient flow on Grassmann manifold

Stochastic Gradient and Updating Rules

Stochastic gradient is obtained by adding a stochastic component

Discrete updating rules

MCMC Simulated Annealing Optimization Algorithm

Let X(0) be any initial condition and t=01. Calculate the gradient matrix A(Xt)

2. Generate d(n-d) independent realizations of wij’s

3. Compute Y (Xt+1) according to the updating rules

4. Compute F(Y) and F(Xt) and set dF=F(Y)- F(Xt)

5. Set Xt+1 = Y with probability min{exp(dF/Dt),1}

6. Set Dt+1 = Dt / and set t=t+1

7. Go to step 1

ORL Face Dataset

Performance Comparison – cont.

Brain Curve Classification

Brain Curve Classification – cont.

Real-time Scene Interpretation

Object detection and recognition problem• Given a set of images, find regions in these images which

contain instances of relevant objects• Here the number of relevant objects is assumed to be large

– For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human brain’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987]

Global Monitoring Through High-resolution Satellite Images

Problem Statement for Scene Interpretation

Object detection and recognition problem• Given a set of images, find regions in these images which

contain instances of relevant objects• Here the number of relevant objects is assumed to be large

– For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987]

Goal • Develop a system that can achieve real-time detection and

recognition for images of size 640 x 480 with high accuracy– Say, at a frame rate of 15 frames per second

Existing Approaches

Fast methods but low accuracy• One can for example classify

one pixel at a time

• However, it is to identify airplanes with high accuracy due to high false positives and negatives

Existing Approaches – cont.

Fast methods but low accuracy• One can for example classify

one pixel at a time• However, it is to identify

airplanes with high accuracy Methods with good

accuracy but slow• One can in theory use

deformable template matching to locate instances of airplanes

• It may need several hours to process one image

Proposed Framework

Specifications and Requirements

We want to detect and recognize at least 30,000 object classes in images• At four different scales

• Using exhaustive search of local windows, that is, we do not assume segmentation or other pre-processing

• If we assume objects are in some (e.g. 21 x 21) windows, this means that there will be many (18,432,000) local windows to be classified/processed

• We want to do this on a 3.6 Ghz Dell Precision workstation with an estimated performance of 28,665.4 MIPS

• This amounts to that we have about 1555 instructions to process a 21 x 21 local window

Requirements – cont.

To achieve the specifications, we need two critical components• A classifier that can reduce the average classification time

effectively– Note that on average we have 1555 instructions; if we can process

90% of those windows using only 100 instructions per window, we can have on average 14,650 instructions for the remaining 10% local windows

• Features that can discriminate a large number of objects and can be computed using a few instructions

– Do such features exist?

Topological Local Spectral Histograms

We introduce a new class of features, which we called TLSH features• It is defined relative to a chosen set of filters

• For a given filter, it is defined as a histogram of a local window of the filtered image

• One bin of the histogram is given by

Topological Local Spectral Histogram Example

Convolution is implemented using FPGAs

Local Spectral Histogram Features

Field Programmable Gate Arrays

• Two primary methods for computation• Hard Wired Application Specific Integrated Circuit (ASIC)

• Software-programmed microprocessors

• New Approach• Programmable hardware

• Field Programmable Gate Arrays (FPGAs) represent a breakthrough in computing technology

– Especially for intrinsically parallel applications

μP/ ASIC / FPGA Comparison Summary

μP ASIC FPGAProgrammable (flexible) Fixed Design Functionality (inflexible) Programmable (flexible)

Relatively Slow Serial Computation Very Fast, highly parallelized computation

Fast, Parallel Computation

Floating and Fixed Point Fixed Point / Floating Fixed Point / Floating

Relatively Inexpensive Design Cycle (Software)

Expensive Design Cycle (requires chip design)

Relatively Inexpensive Design Cycle

Limited Bandwidth Very High Bandwidth Near ASIC Bandwidth

Standard High Level Languages C/C++ or Assembly

Hardware Description Language for Design / Simulation

VHDL / Verilog

Hardware Description Language for Design / Simulation

VHDL / Verilog

Hardware vs. Software

Sum = 0.0I = 0;While (I < L)

tmp = x(i) * h(i) Sum = Sum + tmp

I = I+1end

A typical software implementation takes 4*L instructions to compute one convolution

1

0

)()(L

kkk hnxny• Software

Implementation:

Hardware vs. Software

A custom hardware implementation

Multiply/Accumulate

performed in parallel

Can be done in one clock cycle

Convolution Timing Diagram

Convolution Start Signal Clock

All nine response

values finished

Every 7 ClockCycles: 9

new response

values

Topological Local Spectral Histograms – cont.

Why TLSH features?• It provides a very rich set of over-complete features

– For example, suppose we have 22 filters, there will be 1,173,942 different TLSH features within a 21 x 21 region, considering different windows and different filters

– TLSH features are more effective than Haar features used by Viola and Jones [P. Viola and M. Jones, International Journal of Computer Vision, vol. 57, pp.

137-154, 2004]

ORL Face Dataset

Comparison Between Haar and TLSH Features

COIL Dataset

Texture Dataset

Mixed Dataset

Classifier

To achieve the specification, we also need a classifier that takes only a few instructions to make a decision on average• At the same time, we need to achieve high accuracy

We propose to use a look-up table tree classifier• I.e., a decision tree classifier where each node is

implemented by a look-up table

Look-up Table Tree Classifier

An Example Path in a Decision Tree

Constructing Look-up Table Decision Tree

Joint optimization of clustering, TLSH features, and optimal linear projections• We want to maximize the separations between marginal

distributions of different clusters

• We can do the optimization iteratively– We can do clustering first using current TLSH features and

projections to maximize the separations

– We can find optimal TLSH features given linear projections

– Then we can find optimal linear projections given updated TLSH features


RCT – Rapid Classification Tree, implemented by Keith Haynes

Detection and Recognition

Shape Theory

We want to quantify the difference between two shapes in a principled way• We do this by constructing a shape space and then use the geodesic

distance of two shapes on the shape manifold as the metric

Shape Clustering

Clustering Dendrogram

Sulcal Curves

Sulcal curves are important for characterizing brain functions

Clustering of Sulcal Curves

Modeling Mathematical Abilities and Disabilities

As it is possible to acquire detailed surfaces of the human brain, one may ask how characteristics of the brain structure affect the mathematical abilities and disabilities• The U.S. Department of Education wants to know so that they can understand and

find solutions to the mathematical problems young children have

Corpus callosum examples of young children without mathematical disabilities (a) and with (b)

SurfaVision – A Surface-based Vision System

One of the challenges is how to build a machine vision that is robust• This has been proven to be very difficult after several decades of

computer vision research We may now have a solution for applications in an indoor environment

Multi-Camera Multi-Projector Scanning

Surface Parametrization

Geodesic Interpolation Between Surfaces

Robust Visual Inference

With a common domain for surface representations, we can pose the visual inference in the Bayesian framework by building probability models

Human-Robot Collaborative Interaction

The goal is to let robots be aware of the positions, poses, expressions, moods, and other factors of the humans so that robots can interact with humans collaborative

In collaboration with Prof. Emmanuel Collins at the College Engineering

Automated 3D Phenotype Measurement

The central problem in biology is to understand the relationship between genotype and phenotype• With availability of genomes of humans and model organisms, the central

problem becomes how to measure phenotype at a large scale

3D Urban Models

Courses

Most Relevant Courses • CAP 5638 Pattern Recognition• CAP 5415 Principles and Algorithms of Computer Vision • CAP 6417 Theoretical Foundations of Computer Vision• STA 5106 Computational Methods in Statistics I • STA 5107 Computational Methods in Statistics I I• Seminars and advanced studies

Related Courses• CAP 5615 Artificial Neural Networks• CAP 5600 Artificial Intelligence• CAP 5xxx Machine Learning

Funding of the Group

National Science Foundation• DMS • CISE IIS• FRG• ACT• CCF

NGA – National Geo-spatial Intelligence Agency Army Research Office

• DURIP• Research grant

Companies• Next Century and others under negotiation

Summary

CAVIS group and FSvision group offer interesting research topics/projects• Efficient represent for generic images• Real-time detection and recognition• Computational models for object recognition and image

classification• Medical image analysis• Motion/video sequence analysis and modeling

• They are challenging• They are interesting• They are exciting

Contact Information

• Name Xiuwen Liu• Web sites http://cavis.fsu.edu

http://fsvision.fsu.edu

http://www.cs.fsu.edu/~liux• Email [email protected]• Offices LOV 166 and 118 North Woodward Ave.

• Phones 644-0050 and 645-2257

Thank you!

Any questions?

xiuwen liu department of computer science florida state university

Documents