object recognizing

Object Recognizing

Recognition -- topics

• Features

• Classifiers

• Example ‘winning’ system

Object Classes

http://images.google.com/imgres?imgurl=http://www.ecomagic.org/fruition/trees-1.jpg&imgrefurl=http://www.ecomagic.org/fruition/friends.html&h=375&w=500&sz=99&tbnid=Crq2ZBkq7-kJ:&tbnh=95&tbnw=127&hl=en&start=10&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://pinker.wjh.harvard.edu/photos/cambridge_boston/images/trees%2520in%2520Cambridge%2520Common.jpg&imgrefurl=http://pinker.wjh.harvard.edu/photos/cambridge_boston/pages/trees%2520in%2520Cambridge%2520Common.htm&h=600&w=900&sz=148&tbnid=aCzG9fGgmJAJ:&tbnh=96&tbnw=145&hl=en&start=1&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://www.museum.state.il.us/isas/trees/whiteoak.jpeg&imgrefurl=http://www.museum.state.il.us/isas/trees/&h=344&w=515&sz=37&tbnid=bH2nsyp9QZYJ:&tbnh=85&tbnw=128&hl=en&start=14&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/en/thumb/3/3c/Birchandmaple.jpg/180px-Birchandmaple.jpg&imgrefurl=http://en.wikipedia.org/wiki/Tree&h=232&w=180&sz=29&tbnid=oYewFL9I-NcJ:&tbnh=103&tbnw=79&hl=en&start=20&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://www.stellenboschwriters.com/araucaria.jpg&imgrefurl=http://www.stellenboschwriters.com/trees.html&h=400&w=283&sz=146&tbnid=F9JwGV-Zsc8J:&tbnh=120&tbnw=84&hl=en&start=37&prev=/images%3Fq%3Dtrees%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.jmadden.info/trees/Frosty%2520trees%25203.jpg&imgrefurl=http://www.jmadden.info/trees/trees.htm&h=416&w=312&sz=65&tbnid=nLwh0UJ6peMJ:&tbnh=122&tbnw=91&hl=en&start=51&prev=/images%3Fq%3Dtrees%26start%3D40%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.uri.edu/personal/jsch5838/shoes.jpg&imgrefurl=http://www.uri.edu/personal/jsch5838/pics.html&h=564&w=451&sz=25&tbnid=845fGieS5oUJ:&tbnh=131&tbnw=104&hl=en&start=4&prev=/images%3Fq%3Dshoes%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en

http://images.google.com/imgres?imgurl=http://www.top-trendy.com/images/DC%2520Shoes%2520Womens%2520Lunamm3.jpg&imgrefurl=http://www.top-trendy.com/images/&h=500&w=500&sz=39&tbnid=KbkKC9kiL_wJ:&tbnh=127&tbnw=127&hl=en&start=6&prev=/images%3Fq%3Dshoes%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en

http://images.google.com/imgres?imgurl=http://www.ameliacaruso.com/pinkdotshoesweb.jpg&imgrefurl=http://www.ameliacaruso.com/shoe.htm&h=247&w=325&sz=78&tbnid=UBGvZqY30iAJ:&tbnh=86&tbnw=114&hl=en&start=27&prev=/images%3Fq%3Dshoes%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.muffys.com/images/500.JPG&imgrefurl=http://www.muffys.com/modern_traditional.html&h=480&w=640&sz=32&tbnid=gznXlksZIK8J:&tbnh=101&tbnw=135&hl=en&start=34&prev=/images%3Fq%3Dshoes%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.mydivashop.com/Shoes%2520-%2520sexy%2520black%2520open%2520toe%2520sandal.jpg&imgrefurl=http://www.mydivashop.com/winter_clearance.htm&h=320&w=320&sz=11&tbnid=Ub88RLfQ0V0J:&tbnh=113&tbnw=113&hl=en&start=57&prev=/images%3Fq%3Dshoes%26start%3D40%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.onesmallchild-accessories.com/Shoes-maryjanes.jpg&imgrefurl=http://www.onesmallchild-accessories.com/Shoes-girl.asp&h=720&w=1200&sz=35&tbnid=K7P2imhtuk8J:&tbnh=90&tbnw=150&hl=en&start=60&prev=/images%3Fq%3Dshoes%26start%3D40%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.ncbi.nlm.nih.gov/genome/guide/img/tasha_image.jpg&imgrefurl=http://www.ncbi.nlm.nih.gov/genome/guide/dog/&h=1536&w=1024&sz=237&tbnid=mqZeA11z--0J:&tbnh=150&tbnw=100&hl=en&start=17&prev=/images%3Fq%3Ddog%2B%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DG

http://images.google.com/imgres?imgurl=http://www.dogart.net/images/index.1.gif&imgrefurl=http://www.dogart.net/&h=375&w=298&sz=73&tbnid=KLRZCl5XkUEJ:&tbnh=118&tbnw=93&hl=en&start=6&prev=/images%3Fq%3Ddog%2B%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DG

http://images.google.com/imgres?imgurl=http://i7.photobucket.com/albums/y295/RachelMorris/DC01.jpg&imgrefurl=http://www.suite101.com/discussion.cfm/mixed_breed_dogs/103260&h=500&w=445&sz=30&tbnid=Hkh6KsJq4jQJ:&tbnh=127&tbnw=113&hl=en&start=34&prev=/images%3Fq%3Ddog%2B%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

http://images.google.com/imgres?imgurl=http://animals.timduru.org/dirlist/dog/dog-Trucker.jpg&imgrefurl=http://animals.timduru.org/dirlist/dog/&h=256&w=384&sz=13&tbnid=3au8AbG1xAQJ:&tbnh=79&tbnw=119&hl=en&start=59&prev=/images%3Fq%3Ddog%2B%26start%3D40%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.ezthemes.com/previews/d/dog.jpg&imgrefurl=http://rinnan.net/dog.htm&h=187&w=250&sz=40&tbnid=8_xeHYxVqTwJ:&tbnh=79&tbnw=106&hl=en&start=16&prev=/images%3Fq%3Ddog%2B%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

http://flickr.com/photos/turniptopia/63553871/

http://flickr.com/photos/35618275@N00/59100598/

http://flickr.com/photos/cadeva/54970227/

Individual Recognition

Object partsAutomatic, or query-driven

Headlight

Window

Door knob

Back wheel

Mirror

Front wheel Headlight

Window

Bumper

Class Non-class

Variability of Airplanes Detected

Class Non-class

Features and Classifiers

Same features with different classifiersSame classifier with different features

Generic Features:The same for all classes

Simple (wavelets) Complex (Geons)

Class-specific Features: Common Building Blocks

Optimal Class Components?

• Large features are too rare

• Small features are found

everywhere

Find features that carry the highest amount of information

Entropy

Entropy:

x = 0 1 H

p = 0.5 0.5 ? 0.1 0.9 0.47 0.01 0.99 0.08

)p(x log )p(x- H i2i

Mutual information

H(C) when F=1 H(C) when F=0

I(C;F) = H(C) – H(C/F)

F=1 F=0

H(C)

))(()()( cPLogcPcH

Mutual Information I(C,F)

Class:11010100

Feature:10011100

I(F,C) = H(C) – H(C|F)

Optimal classification features

• Theoretically: maximizing delivered information minimizes classification error

• In practice: informative object components can be identified in training images

Mutual Info vs. Threshold

0.00 20.00 40.00

Detection threshold

Mu

tu

al

Info

forehead

hairline

mouth

eye

nose

nosebridge

long_hairline

chin

twoeyes

Selecting Fragments

Horse-class features

Car-class features

Pictorial features Learned from examples

Star model

Detected fragments ‘vote’ for the center location

Find location with maximal vote

In variations, a popular state-of-the art scheme

Bag of words

ObjectObject Bag of ‘words’Bag of ‘words’

Bag of visual words A large collection of image patches

–

1.Feature detection 1.Feature detection and representationand representation

•Regular grid– & VogelSchiele ,2003

–Fei- ,Fei & Perona2005

Generate a dictionary using K-means clustering

Recognition by Bag of Words (BoD): Each class has its words historgram

–

–

–

Limited or no GeometrySimple and popular, no longer state-of-the art .

HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection

Shape context

Recognition Class II:

SVM Example Classifiers

SVM – linear separation in feature space

Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w|Margin: 2/|w|

0+1

-1 The Margin

Max Margin Classification

)Equivalently, usually used

How to solve such constraint optimization ?

The examples are vectors xi

The labels yi are +1 for class, -1 for non-class

Solving the SVM problem

• Duality

• Final form

• Efficient solution

• Extensions

Using Lagrange multipliers :

Using Lagrange multipliers: Minimize LP =

With αi > 0 the Lagrange multipliers

Minimizing the Lagrangian

Minimize Lp :

Set all derivatives to 0:

Also for the derivative w.r.t. αi

Dual formulation: Maximize the Lagrangian w.r.t. the αi and the above two conditions.

Solved in ‘dual’ formulation

Maximize w.r.t αi :

With the conditions:

Dual formulation

Mathematically equivalent formulation: Can maximize the Lagrangian with respect to the αi

After manipulations – concise matrix form :

Summary points

• Linear separation with the largest margin, f(x) = w∙x + b

• Dual formulation

• Natural extension to non-separable classes

• Extension through kernels, f(x) = ∑αi yi K(xi x) + b

Felzenszwalb

• Felzenszwalb, McAllester, Ramanan CVPR 2008. A Discriminatively Trained, Multiscale, Deformable Part Model

• Many implementation details, will describe the main points.

Using patches with HoG descriptors and classification by SVM

Person model HoG orientations with w > 0

Object model using HoG

A bicycle and its ‘root filter ’The root filter is a patch of HoG descriptor Image is partitioned into 8x8 pixel cells In each block we compute a histogram of gradient orientations

The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale

Dealing with scale: multi-scale analysis

A part Pi = (Fi, vi, si, ai, bi) .

Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box

ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2

Adding Parts

Bicycle model: root, parts, spatial map

Person model

The full score of a potential match is: ∑ Fi ∙ Hi + ∑ ai1 xi + ai2 yi

+ bi1xi2 + bi2yi

2

Fi ∙ Hi is the appearance part

xi, yi, is the deviation of part pi from its expected location in the model. This is the spatial part.

Match Score

The score of a match can be expressed as the dot-product of a vector β of coefficients, with the image:

Score = β∙ψ

Using the vectors ψ to train an SVM classifier :β∙ψ > 1 for class examples

β∙ψ < 1 for class examples

Using SVM:

β∙ψ > 1 for class examples β∙ψ < 1 for class examples

However, ψ depends on the placement z, that is, the values of ∆xi, ∆yi

We need to take the best ψ over all placements. In their notation :Classification then uses β∙f > 1

We need to take the best ψ over all placements. In their notation :

Classification then uses β∙f > 1

search with gradient descent over the placement. This includes also the levels in the hierarchy. Start with the root filter, find places of high score for it. For these high-scoring locations, each for the optimal placement of the parts at a level with twice the resolution as the root-filter, using GD.

Final decision β∙ψ > θ implies class

Recognition

Essentially maximize ∑Fi Hi + ∑ ai1 xi + ai2 y + bi1x2 + bi2y2

Over placements (xi yi)

• Training -- positive examples with bounding boxes around the objects, and negative examples.

• Learn root filter using SVM

• Define fixed number of parts, at locations of high energy in the root filter HoG

• Use these to start the iterative learning

Hard Negatives

The set M of hard-negatives for a known β and data set DThese are support vector (y ∙ f =1) or misses (y ∙ f < 1)

Optimal SVM training does not need all the examples, hard examples are sufficient. For a given β, use the positive examples + C hard examples Use this data to compute β by standard SVM Iterate (with a new set of C hard examples)

All images contain at least 1 bike

Future challenges :

• Dealing with very large number of classes – Imagenet, 15,000 categories, 12 million images

• To consider: human-level performance for at least one class

object recognizing

Documents

raresmall features

large features

mutual information hc

hc hcff

hc mutual information

delivered information

informative object components

wheel mirror