unsupervised learning of visual taxonomies
DESCRIPTION
Unsupervised Learning of Visual Taxonomies. IEEE conference on CVPR 2008 Evgeniy Bart – Caltech Ian Porteous – UC Irvine Pietro Perona – Caltech Max Welling – UC Irvine. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Unsupervised Learning of Visual Taxonomies
IEEE conference on CVPR 2008Evgeniy Bart – Caltech
Ian Porteous – UC IrvinePietro Perona – CaltechMax Welling – UC Irvine
IntroductionRecent progress in visual recognition has been
dealing with up to 256 categories. The current organization is an unordered ’laundry list’ of names and associated category models
The tree structure describes not only the ‘atomic’ categories, but also higher-level and broader categories in a hierarchical fashion
Why worry about taxonomies
TAX ModelImages are represented as bags of visual wordsEach visual word is a cluster of visually similar
image patches, it is a basic unit in the modelA topic represents a set of words that co-occur in
images. Typically, this corresponds to a coherent visual structure, such as skies or sand
A category is represented as a multinomial distribution over all the topics
TAX ModelShared information is represented at nodes
: the distribution of category c: a uniform Dirichlet prior of : topic t: a uniform Dirichlet prior of : a level in the taxonomy of detection d in image i: a topic of detection d in image I: the l’th node on the path
InferenceThe goal is to learn the structure of the taxonomy
and to estimate the parameters of the modelUse Gibbs sampling, which allows drawing
samples from the posterior distribution of the model’s parameters given the data
Taxonomy structure and other parameters of interest can be estimated from these samples
InferenceTo perform sampling, we calculate the conditional
distributions # of detections assigned to node and topic excluding current detection d
# of detections assigned to topic z and word excluding current detection d
# of detections assigned to node and topic t, excluding current image i
# of images that go through node c in the tree, excluding current image i: # of detections in image i assigned to level l and topic t
Experiment 1 : CorelPick 300 color images from the Corel datasetUse ‘space-color histograms’ to define visual
words(total 2048 visual words)
500 pixels were sampled from each image and encoded using the space-color histograms
888
888 888
888
Experiment 1 : Corel4 levels, 40 topicsSet Run Gibbs sampling for 300 iterations
Experiment 1 : Corel
A B
R
1 2 3 4 5 6 7 8 9
A
B
Experiment 2 : 13 scenesUse 100 examples per category to train the modelExtract 500 patches of size 2020 randomly from
each imagePick 100,000 patches from total 650,000 patches,
run k-means with 1000 clustersThe 500 patches of each image is then assigned
to the closest visual word
Run Gibbs sampling for 300 iterationsSet
Experiment 2 : 13 scenes: the probability of a new test image j given a
training image i
The mean of each topic The estimate of the distribution over topics at level l in the path for image i
Experiment 2 : 13 scenes
Evaluation of Experiment2
ConclusionSupervised TAX outperforms supervised LDA
therefore suggests that a hierarchical organization better fits the natural structure of image patches
The main limitation of TAX is the speed of training. For example, with 1300 training images, learning took 24 hours