grammar of image zhaoyin jia, 03-30-2009. problems enormous amount of vision knowledge: ...
TRANSCRIPT
Problems Enormous amount of vision knowledge:
Computational complexity
Semantic gap
……
20 40 60 80 100 120 140 160
20
40
60
80
100
120
Classification,Recognition
Objectives in this paper Framework for vision
And-Or Graph
Algorithm for this framework Top-down/bottom-up computation
Generalization of small sample Use Monte Carlos simulation to
synthesis more configurations
Fill the semantic gap
Grammar Language: co-occurance of s is more than
chance
Image: Parallel; T-junction
( | )1
( | ) ( | )
p s A B
p s A p s B
CONSTANTINOPLE
Formulation of grammar Start symbol: S Non-terminal nodes: VN
Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar Start symbol: S Non-terminal nodes: VN
Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar Start symbol: S Non-terminal nodes: VN
Reproduction Rule: R
Terminal nodes: VTS NP VP
VP VP PPVP V NP
……
Formulation of grammar Start symbol: S Non-terminal nodes: VN
Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar Start symbol: S Non-terminal nodes: VN
Reproduction Rule: R
Terminal nodes: VT
For each VN , we have reproduction rules:
with a probability associated with each one:
Probability of parsing tree:
Probability of sentence:
Stochastic Context Free Grammar
Stochastic Grammar with Context From left to right: bi-gram model (Markov
chain)a sentence with n words:
Non-local relations: tree model
Building the image grammar Visual Vocabulary:
primitives, sketch graph, textons… Relations and configurations:
co-occurance, attached, hinged, supported, occluded…
And-or Graph representationembedding image grammar
Learning /testing the parse graphfind the possible inference
Database Lotus Hill Institute Dataset
636,748 images, 3,927,130 Physical Objects
A few hundred are free
Benjamin Yao, Xiong Yang, and Song-Chun Zhu, “Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks.” EMMCVPR, 2007http://www.imageparsing.com/
Free Data
6 categories, 145 subsetsManmade Object 75 Nature Object 40 Objects in
Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport
Activity 10
Outline & segmentation of the object
http://yoshi.cs.ucla.edu/yao/data/
Free Data
6 categories, 145 subsetsManmade Object 75 Nature Object 40 Objects in
Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport
Activity 10
Segmentation of a scene (street)
http://yoshi.cs.ucla.edu/yao/data/
Free Data
6 categories, 145 subsetsManmade Object 75 Nature Object 40 Objects in
Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport
Activity 10
Physical parts of the object
http://yoshi.cs.ucla.edu/yao/data/
OBJECT1:truck
OBJECT1:truck PART1:truck:body PART2:truck:windshield PART3:truck:headlight PART4:truck:headlight PART5:truck:headlight PART6:truck:headlight
PART7:truck:rearview mirror PART8:truck:rearview mirror PART9:truck:rear light PART10:truck:window PART11:truck:frontal left wheel PART12:truck:frontal right wheel
PART13:truck:back wheel PART14:truck:back wheel PART15:truck:carriage
Visual Vocabulary
: function of image primitives
: a) geometry transformation
b) appearance
: bond between each primitives
Visual Vocabulary Sketch and Texture
SK NSK SK NSKI I I
S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997
Primal sketch model
Input image
Sketch graph
Texture pixelsC. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in
Proceedings of International Conference on Computer Vision,2003.
Primal sketch model
C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of International Conference on Computer Vision,2003.
High level visual vocabulary Cloth: collar, left/right sleeves, hands
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006
Relations and configurations Definition of relation:
bonds:relations: , : structure, : compatibility
Three types of relations Bonds and connections Joints and junctions Object interactions/semantics
Definition of configurations:
{( , )}s t S S
{( , ; , ) : , }E s t s t S
,C V E { : ( ( , ; ), ) };i i i iV A A x y
Relations Bonds and connections
connects primitives into bigger graphs
intensity/color compatibility
{ , 1,2,..., , 1, 2,..., ( )}ijS i n j n i
( ) {( , ; , )}bond ij ijE S
( , , )x y
Configuration Spatial layout of entities at a certain level
Primal sketch – parts – object – scene
,C V E { : ( ( , ; ), ) };i i i iV A A x y
Inference of the configuration Have the primal sketch of the image Detect the ‘T-junction’ Simulated annealing to infer the Gestalt Law
R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
Red dot: connect region
Black line: known edge
Green line: inferred connection
Reconfigurable graphs
Ru-Xin Gao1, Tian-Fu Wu, Song-Chun Zhu, and Nong Sang, “Bayesian Inference for Layer Representation with Mixed Markov Random Field ”
Source image
T-junction
Inferred connection
Layer extractio
n
Reconfigurable graphs
R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
And-Or Graph Parse graph of the image
pt: parse tree of vocabulary E: relations Inference the parse graph:
( , )pg pt E
* argmax ( | )pg p pg I
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.
Contain all the valid parse graphs
And node, Or node, leaf-node
Relation between children of And node
Parse tree: assigning label on Or node
And-Or Graph
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.
Definition: image primitives relations at all level : probability model defined on the And-Or
graph : valid configuration of terminal nodes
And-Or Graph, , , , ,and or N TG S V V R P
and orNV V V 1 2, ,...or And AndV V V 1 2| , ,...And Or Or
TV V V V
{ ( , ; ), )TV x y
{( , ; , )}m s t st stR E v v
P
Stochastic Model on And-Or graph Terminal (leaf) node: And-Or node: Set of links: Switch variable at Or-node: Attributes of primitives:
( )T pg
( ), ( )or andV pg V pg
( )E pg
1( ; , , ) exp( ( ))
( )p pg R pg
Z
( ) ( ) ( )
( , ) ( )
( ) ( ( )) ( ( ))
( , , , )
Or andv t
v V pg v V pg T pg
ij i j ij iji j E pg
pg w v t
v v
( )w t
( )t
Stochastic Model on And-Or graph Terminal (leaf) node: And-Or node: Set of links: Switch variable at Or-node: Attributes of primitives:
( )T pg
( ), ( )or andV pg V pg
( )E pg
1( ; , , ) exp( ( ))
( )p pg R pg
Z
( ) ( ) ( )
( , ) ( )
( ) ( ( )) ( ( ))
( , , , )
Or andv t
v V pg v V pg T pg
ij i j ij iji j E pg
pg w v t
v v
( )w t
( )t
SCFG: weigh the frequency at the children of or-nodes
Stochastic Model on And-Or graph Terminal (leaf) node: And-Or node: Set of links: Switch variable at Or-node: Attributes of primitives:
( )T pg
( ), ( )or andV pg V pg
( )E pg
1( ; , , ) exp( ( ))
( )p pg R pg
Z
( ) ( ) ( )
( , ) ( )
( ) ( ( )) ( ( ))
( , , , )
Or andv t
v V pg v V pg T pg
ij i j ij iji j E pg
pg w v t
v v
( )w t
( )t
Weigh the local compatibility of primitives (geometric and appearance)
Stochastic Model on And-Or graph Terminal (leaf) node: And-Or node: Set of links: Switch variable at Or-node: Attributes of primitives:
( )T pg
( ), ( )or andV pg V pg
( )E pg
1( ; , , ) exp( ( ))
( )p pg R pg
Z
( ) ( ) ( )
( , ) ( )
( ) ( ( )) ( ( ))
( , , , )
Or andv t
v V pg v V pg T pg
ij i j ij iji j E pg
pg w v t
v v
( )w t
( )t
Spatial and appearance between primitives (parts or objects)
Learning And-Or Graph
Learning the vocabulary Learning the relation set R, given Learning the parameters , given R and
1( ; , , ) exp( ( ))
( )p pg R pg
Z
( , ) ( )( ) ( ) ( )
( ) ( ( )) ( ( )) ( , , , )Or and
v t ij i j ij iji j E pgv V pg v V pg T pg
pg w v t v v
Learning And-Or Graph
Learning the vocabulary , and hierarchic And-Or Graph
Learning the relation set R, given Learning the parameters , given R and
1( ; , , ) exp( ( ))
( )p pg R pg
Z
( , ) ( )( ) ( ) ( )
( ) ( ( )) ( ( )) ( , , , )Or and
v t ij i j ij iji j E pgv V pg v V pg T pg
pg w v t v v
Discussed in the paper
Learning And-Or Graph
Learning and Pursuing Relation Set R: Start from Stochastic
Context Free Graph (a)
Learn the relations that maximally reduce the KL divergence to the observation (b-e)
( , )f I pgObservation:
Learning model:
( ; , , )p pg R
J. Porway, Z. Y. Yao, and S. C. Zhu, “Learning an And–Or graph for modeling and recognizing object categories,” Technical Report, Department of Statistics,2007
Learning graph parameter Approximating to Similar to texture synthesis
( ; , , )p pg R ( , )f I pg
S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997
Learning And-Or Graph
Case I: Rectangle Nodes: Rectangle
Two vanishing points, four edge direction
Rules:
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.
Case I: Rectangle Get the primal sketch of the
scene
Find the ‘strong’ rectangular (bottom-up, red)
Weigh (score) different hypothesis (top-down, blue) Weight is the compatibility of the
image with the proposed rectangular (primal-sketch)
Accept the best one
Do the previous 3 steps until all the weigh is small. (negative)F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.
2
( , )
2
(( ( , ) ( , ))
~ exp( )2
kx y
I x y B x y
Case I: Rectangle
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.
Case II: Human Cloth Use And-Or graph to generate a matching
model
Vocabulary (training dataset)
Matching using the And-
or Graph
Matching using the And-
or Graph
Case II: Human Cloth The And-Or
Graph
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Novel Configuration
Inference process
Case II: Human Cloth
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Localize face, then estimate the parts of the body
Bottom-up: a coarse matching of the parts
Top-down: refine the matching using the relation
Case II: Human Cloth Inference result
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case II: Human Cloth Inference result
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Hands are not exactly the same: find the best matching in the dataset
Case III: RecognitionZ. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottomup algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.
Conclusion Computational complexity :
Remain open for scheduling bottom-up/top-down procedure
Semantic Gap Learning the And-Or Graph Learning the vocabulary , and its attributes
After all, we are not supposed to define so many things:
ideal vision words:
what we have now:
20 40 60 80 100 120 140 160
20
40
60
80
100
120