image-grain comparison of core object recognition behavior...
TRANSCRIPT
Image-grain comparison of core object recognition behavior in humans, monkeys and machinesRishi Rajalingham*, Elias B. Issa*, Kohitij Kar, Kailyn Schmidt, Pouya Bashivan, James J. DiCarloDept. of Brain and Cognitive Sciences1, McGovern Institute for Brain Research2, MIT
Methods
ResultsIntroduction
Summary
Humans can rapidly and accurately recognize objects in spite of variation in viewing pa-rameters (e.g. position, pose, and size) and background conditions. To uncover the algo-rithms underlying this ability, quantitative benchmarks of human behavior can be used to directly compare animal, neural and computational vision systems. Previously, we showed that on one such benchmark, the pattern of object-level confu-sions, monkeys behavior was indistinguishable from human behavior. This pattern was also shared with high-performing object recognition models (convolution neural networks, CNNs) but not models of lower-level visual representations. Here, we extended on our previous work by systematically comparing the core object recognition behavior of humans against that of monkeys and machines at the image-level using the pattern of image-by-image di�culties.
Fixation(200ms)
Test Image Presentation(100ms)
Choice Image Presentation (<1500ms) & Hold (700ms)
Behavioral paradigm: We tested object recognition performance on a two alter-native forced choice (2AFC) match-to-sample task using brief (100ms) presentations of naturalistic syn-thetic images of basic-level objects.
High-throughput psychophsyics: Over a million behavioral trials across hundreds of humans and �ve monkeys were ag-gregated to characterize each species on each image. To col-lect such large behavioral datasets, we used Amazon Me-chanical Turk for humans, and a novel home-cage touch-screen behavioral system (”MonkeyTurk”) for monkeys.
MO
NKE
Y
V1 H
MA
X
ALE
XNET
(fc6
)
ALE
XNET
(fc7
)
ALE
XNET
(fc8
)
VG
G(f
c6)
VG
G(f
c7)
VG
G(f
c8)
GO
OG
LEN
ET
RESN
ET
GO
OG
LEN
ET (v
3)
GO
OG
LEN
ET (v
3)sy
nthe
tic
trai
ned
GO
OG
LEN
ET (v
3)re
tina
trai
ned
High-resolution, image-grain behavior is a stronger constraint than object-level be-havior for distinguishing between alternative models of human object recognition.
The models’ gap in consistency at the image level, though small, could not be easily explained by image viewing parameters (position, pose and size).
Comparing Humans and Monkeys:
Comparing Humans and Machines (AlexNet):
Behavioral Benchmark:
Our results show that monkeys are highly consistent with the pooled human in their pattern of image di�culties. In contrast, all tested CNN models were signi�cantly less consistent with the pooled human.
• Our results show that monkeys are highly consistent with humans in their pat-tern of image di�culties, suggesting that they are a good animal model for studying human object recognition abilities.
• In contrast, all tested CNN models were signi�cantly less consistent with humans at the image grain even though they passed at the object grain. This small gap in image-level consistency to humans could not be easily attributed to the viewing parameters (position, size, pose) of the object, classi�er choice, images used in model training, or retinal sampling of images.
• Therefore, high-resolution, image-grain behavioral measures serve as stron-ger constraints than object-level measures for distinguishing between alterna-tive mechanistic models of human visual object recognition.
Humans Monkeys GoogLeNet(v3)
GoogLeNet (v3)synthetic trained
Monkey
V1HMAX
V1HMAX
Monkey GoogLeNet (v3)synthetic trained
Object grain Image grainObject grain Image grain
200 trainin
g im
ages/o
bject
100 testing
imag
es/ob
ject
Wrench Rhino LegCamelTankBirdShortsElephantHouseHangerPenHammer
ZebraForkGuitarBearWatchDogSpiderTruckTableCalculatorGunKnife
Stimuli: To enforce true object recognition behavior, we generated several thousand nat-uralistic images, each with one foreground object (by rendering a 3D model of each object with random viewing parameters) on a random natural image background.
Each object can produce myriad images under variations in viewing parameters, with some images being more challenging than others.
HARD EASYrh
inoca
lculat
or
hous
edo
gbir
d
gun
hang
er
eleph
ant
truck leg
table
watchbearfork
knife
cam
el
guita
r
tank
wren
ch
spide
r
ham
mer
zebr
a
shor
ts
pen