petacat : applying ideas from copycat to image understanding
DESCRIPTION
Petacat : Applying ideas from Copycat to image understanding. How Streetscenes Works ( Bileschi , 2006). 1. Densely tile the image with windows of different sizes. 2. HMAX C2 features are computed in each window. 3. The features in each window are given as input - PowerPoint PPT PresentationTRANSCRIPT
Petacat: Applying ideas from Copycat
to image understanding
How Streetscenes Works(Bileschi, 2006)
1. Densely tile the image withwindows of different sizes.
2. HMAX C2 features are computed in each window.
3. The features in eachwindow are given as inputto each of five trained support vector machines (“pedestrian”, “car”, “bicycle”, “building”, “tree”)
4. If any return a classification with score above a learned threshold, that object is said to be “detected” .
…
Object detection (here, “car”) with HMAX model (Bileschi, 2006)
Limitations of Streetscenes approach for “image understanding”
Limitations of Streetscenes approach for “image understanding”
• Exhaustive search – not scalable
• Does not recognize spatial and abstract relationships among objects for whole scene understanding
• Has no prior knowledge about object categories and their place in “conceptual space”
• HMAX model is completely feed-forward; no feedback to allow context to aid in scene understanding. – Where should feedback come in?
Person Dog
leash attached to
walking
actionaction
holds
Representation of High-Level Knowledge: A Simple Semantic Network (or “Ontology”)
“Dog walking”
But...
Person Dog
leash attached to
walking
actionaction
holds
Modified Ontology
Dog Group
running
“Dog walking”
Person Dog
leash attached to
walking
actionaction
holds
Modified Ontology
running
Allowing “conceptual slippage”
“Dog walking”
Dog Group
But...
Person
leash attached to
walking
actionaction
holds
“Dog walking”
Modified Ontology
running
Cat
Iguana
Dog
Dog Group
But...
But...
But...
But...
Person Dog
leash attached to
walking
actionaction
holds
Modified Ontology
running
Cat
Iguana
Bicycle
Car
Helicopter
“Dog walking”
Dog Group
But...
PersonDog
Leash
Outside
Ground
WalkingRunning
Standing
Tree
Inside
Stick
Close to
Far from
Beach
Sidewalk
Attached to
Grass Lawn mower
Gasoline
Runway
Airplane
Helicopter
Above
Left of
Holding
Dog walking
Dog grooming
Car
Sky
ArmyTrack
Fanny pack
Backpack
Need dynamical process of constructing representation.
Need dynamical process of constructing representation.
Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.
Need dynamical process of constructing representation.
Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.
– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search
Need dynamical process of constructing representation.
Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.
– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search
– Prior, higher-level knowledge interacts with lower-level vision in both directions (bottom-up and top-down).
Need dynamical process of constructing representation.
Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.
– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search
– Prior, higher-level knowledge interacts with lower-level vision in both directions (bottom-up and top-down).
– Concepts are “fluid”, allowed to “slip” in certain contexts.
Need dynamical process of constructing representation.
Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.
– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search
– Prior, higher-level knowledge interacts with lower-level vision in both directions (bottom-up and top-down).
– Concepts are “fluid”, allowed to “slip” in certain contexts.
• This allows perception of essential similarity in the face of superficial differences—i.e., analogy-making.
–
Active Symbol Architecture(Hofstadter et al., 1995)
Active Symbol Architecture(Hofstadter et al., 1995)
• Basis for – Copycat (analogy-making), Hofstadter & Mitchell
– Tabletop (anlaogy-making), Hofstadter & French
– Metacat (analogy-making and self-awareness), Hofstadter & Marshall
and many others…
Semantic network
Temperature
Workspace
Active Symbol Architecture(Hofstadter et al., 1995)
Perceptual agents (codelets)
Petacat:
(Descendant of Copycat)
Integration of Active Symbol Architecture and HMAX
Initial task: Decide if image is an instance of “taking a dog for a walk”, and if so, how good an instance it is.
taking a dog for a walk
outdoors
has location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leashsidewalk
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
is on Spatial Relation
Semantic Network
Property links
Slip links
taking a dog for a walk
outdoors
has location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leashsidewalk
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
is on Spatial Relation
Semantic Network
Property links
Slip links
taking a dog for a walk
outdoors
has location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leashsidewalk
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
is on Spatial Relation
Semantic Network
Properties of nodes
Workspace
Semantic network
Workspace
Semantic network
Perceptual Agents (Codelets)
Codelets as active symbols
taking a dog for a walkhas location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leash
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
sidewalk
outdoors
is on
Spatial Relation
taking a dog for a walkhas location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
horse
swims
ropebelt
leash
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
is on
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
sidewalk
outdoors
Spatial Relation
cat
taking a dog for a walkhas location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leash
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
is on
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
sidewalk
outdoors
Spatial Relation
taking a dog for a walkhas location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leash
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
is on
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
sidewalk
outdoors
Spatial Relation
Dog?
Illustration of what we plan to have happen – not a real run of Petacat
Dog? Dog?
Person?
Illustration of what we plan to have happen – not a real run of Petacat
Dog? Dog?
Sidewalk?
Person?
Illustration of what we plan to have happen – not a real run of Petacat
Dog? Dog?
Sidewalk?
Person?
Dog?
Outdoors?
Illustration of what we plan to have happen – not a real run of Petacat
Dog? Dog?
Sidewalk?
Person?
Dog?
Outdoors?
Scout codelets: Send C1 features in window to corresponding SVM.If positive result, post builder codelet with urgency equal to SVM’sconfidence.
Illustration of what we plan to have happen – not a real run of Petacat
Dog?negative Dog?
negative
Sidewalk?positive: 0.4
Person?negative
Outdoors?positive: 0.7
Scout codelets: Send C1 features in window to corresponding SVM.If positive result, post builder codelet with urgency equal to SVM’sconfidence.
Dog?positive: 0.8
Illustration of what we plan to have happen – not a real run of Petacat
Builder codelets: Ask HMAX to compute C2 features using prototypes specific to the object (or scene), and send them to corresponding SVM. If positive, decide to build structure with probability equal to SVM confidence. Break competing structures if necessary.
Dog?negative Dog?
negative
Sidewalk?positive: 0.4
Person?negative
Outdoors?positive: 0.7
Dog?positive: 0.8
Illustration of what we plan to have happen – not a real run of Petacat
Builder codelets: Ask HMAX to compute object-/scene-specific C2 features, and send them to corresponding SVM. If positive, decide to build structure with probability equal to SVM confidence. Break competing structures if necessary.
Outdoors
Dog
Illustration of what we plan to have happen – not a real run of Petacat
taking a dog for a walkhas location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leash
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
is on
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
sidewalk
outdoors
Spatial Relation
Dog? Dog
Leash?
OutdoorsLeash?
Sidewalk?
Person?
Person?
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonStrength: 0.75
Outdoors
Sidewalk
PersonStrength: 0.6
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonOutdoors
Sidewalk
Illustration of what we plan to have happen – not a real run of Petacat
taking a dog for a walkhas location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leash
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
is on
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
sidewalk
outdoors
Spatial Relation
Dog
PersonOutdoors
Sidewalk
Leash?
Leash?
Dog?
Sidewalk?
Dog? Rope?
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonOutdoors
Sidewalk
Leash
Dog(weak)
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonOutdoors
Sidewalk
Leash
Dog(weak)
Dog(strong)
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonOutdoors
Sidewalk
Leash
Dog
Illustration of what we plan to have happen – not a real run of Petacat
taking a dog for a walkhas location
persondog
has action
is onis touching
has component
aroad
abeach
trail
drives
runsflies
cathorse
swims
ropebelt
leash
string
walkswalks
is in front of
has location
has action
has component
has componenthas component
stands
is on
sits
is in front of
is touching
is behind
is next to
is on
agrass
is touching
Object
Action
indoors
sidewalk
outdoors
Spatial Relation
Dog
PersonOutdoors
Sidewalk
Leash
Dog
Once objects begin to be built, relation and grouping codelets can run on them.
is next to
is in front of
is next to
is in front of
Dog group
Illustration of what we plan to have happen – not a real run of Petacat
Once objects begin to be built, relation and grouping codelets can run on them.
Dog
PersonOutdoors
Sidewalk
Dog
is next to
is next to
Dog group
Leash
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonOutdoors
Sidewalk
Dog
is next to
is next to
Dog group
is next to
Leash
Illustration of what we plan to have happen – not a real run of Petacat
How codelets decide where to look
System starts out with weak segmentation (e.g., “normalized cuts” algorithm)
How codelets decide where to look
System starts out with weak segmentation (e.g., “normalized cuts” algorithm)
System creates “heat maps” for location andscale of objects in general(at each pixel, probability of findingan object at this location and at a particular height/width of bounding box.
++++
How codelets decide where to look
System starts out with weak segmentation (e.g., “normalized cuts” algorithm)
System creates “heat maps” for location andscale of objects in general(at each pixel, probability of findingan object at this location and at a particular height/width of bounding box.
Object scout codelets choose location and scale probabilisitically from these heat maps.
++++
How codelets decide where to look
When codelets look for individual object categories (e.g., dog), object-specific heat maps are created
+
Dog
Person heat map
+
How codelets decide where to look
When codelets look for individual object categories (e.g., dog), object-specific heat maps are created
As codelets build structure, heat maps are continually updated to reflect prior (learned) expectations about location and scale as a function of location and scale of “built” objects (as well asoriginal weak segmentation).
+
Dog
+
Person heat map
Person?Person?
How Petacat makes a final decision
Temperature
taking a dog for a walk
Dog
PersonOutdoorsLeash
Dog
is next to
is next to
Dog group Sidewalk
is next to
Illustration of what we plan to have happen – not a real run of Petacat
How Petacat makes a final decision
Temperature
taking a dog for a walk
Dog
PersonOutdoorsLeash
Dog
is next to
is next to
Dog group Sidewalk
“Situation” codelet is more likely to run when temperature is low.
is next to
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonOutdoors
Leash
Dog
is next to
is next to
Dog group
is next to
Situation codelet tries to match prototypical situation with existing workspace structures, possibly allowing slippages. Sidewalk
Illustration of what we plan to have happen – not a real run of Petacat
Dog
PersonOutdoors
Leash
Dog
is next to
is next to
Dog group Sidewalk
person
taking a dog for a walk
leash
dog
outdoors
is next to
has componenthas component
has component
has location
is in front of
Situation codelet tries to match prototypical situation with existing workspace structures, possibly allowing slippages.
Dog
PersonOutdoors
Leash
Dog
is next to
is next to
Dog group
person
taking a dog for a walk
leash
dog
outdoors
is next to
has componenthas component
has component
has location
is in front of
is next toDog group
Sidewalk
Dog
PersonOutdoors
Leash
Dog
is next to
is next to
Dog group
person
taking a dog for a walk
leash
dog
outdoors
is next to
has componenthas component
has component
has location
is in front of
is next toDog group
If resulting temperature is low enough, classify scene as positive
Sidewalk
Dog
PersonOutdoors
Leash
Dog
is next to
is next to
Dog group Sidewalk
person
taking a dog for a walk
leash
dog
outdoors
is next to
has componenthas component
has component
has location
is in front of
is next toDog group
If situation codelet fails enough times or does not run for a long time,program has increasing chance of ending with negative classification.
If resulting temperature is low enough, classify scene as positive
If Petacat classifies the picture as positive, the temperature at the end of the run gives a measure of how good an instance the picture is (e.g., of the “dog walking” situation).
Summary:
Summary: How does Petacat avoid exhaustive search?
Summary: How does Petacat avoid exhaustive search?
Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image
• C1, C2 features in windows
• Object categories (e.g., car, pedestrian, tree, etc.)
Summary: How does Petacat avoid exhaustive search?
Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image
In Petacat, codelets choose window size and location based on learned expectations and perceived context, with probabilities continually changing as more information is obtained
• C1, C2 features in windows
• Object categories (e.g., car, pedestrian, tree, etc.)
Summary: How does Petacat avoid exhaustive search?
Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image
In Petacat, codelets choose window size and location based on learned expectations and perceived context, with probabilities continually changing as more information is obtained
• C1, C2 features in windowsCodelets request C2 features only in “relevant” windows, and request only C2 features that are relevant to what the codelet is looking for.
• Object categories (e.g., car, pedestrian, tree, etc.)
Summary: How does Petacat avoid exhaustive search?
Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image
In Petacat, codelets choose window size and location based on learned expectations and perceived context, with probabilities continually changing as more information is obtained
• C1, C2 features in windowsCodelets request C2 features only in “relevant” windows, and request only C2 features that are relevant to what the codelet is looking for.
• Object categories (e.g., car, pedestrian, tree, etc.)Codelets look for object categories that are activated by context, based on prior expectations and currently perceived information.
Summary: How does Petacat avoid exhaustive search?
• Petacat effects a parallel terraced scan (Hofstadter, 1995):
Codelets build structures at a rate (urgency) based on their perceived promise, which is continually updated as new information is perceived.
Temperature allows this (continually changing) rate to depend on the global state of the system.
Relation to neuroscience/psychophysics– Gilbert & Sigman (2007): Emphasis of role to top-down
processing in vision. • “V1 and V2 may work as ‘active blackboards’ that integrate
and sustain the result of computations performed in higher areas.
– Kahneman, Triesman, and Gibbs (1992): Notion of “object files”: temporary and modifiable perceptual structures, created on the fly in working memory, which interact with a permanent network of concepts.
– Churchland, Ramachandran, and Sejnowski: Theory of “interactive vision”
– Treisman and colleagues: Shift between parallel, random, “pre-attentive” bottom-up processing and more deterministic, focused, serial, “attentive” top-down processing.
Does Petacat understand pictures?
Does Petacat understand pictures?
Understanding (MM’s defintion):
- Ability to appropriately use one’s knowledge and make appropriate conceptual slippages in a wide variety of environments/contexts.
- Ability to use one’s existing concepts to learn new concepts