cats and dogs › ~vgg › publications › ... › poster.pdf · • collected from various...

The spatial histograms use a variety of layouts latching in different ways on the head location and body segmentation.

Classification accuracies for different layouts Conclusions on the various layouts:

•  Using features on body parts improves performance over spatial BoW methods.

•  Better localization of pet body improves the performance.

Species and breed are predicted by combining the head detectors scores and the appearance features in two ways:

•  Hierarchical: The head detectors scores are used to decide between cat or dog; then the app. features are fed to a linear SVM for breed classification

•  Flat: The head detector responses and appearance features are concatenated to jointly decide species and breed

Classification accuracies for different combinations straeies

Classification accuracies for feature combinations

Conclusions: •  Combining shape with appearance improves accuracy

significantly in both species as well as breed classification •  Flat classification is more accurate than the hierarchical

method

1Department of Engineering Science, University of Oxford 2 International Institute of Information Technology, Hyderabad, India E-mail : {omkar,vedaldi,az}@robots.ox.ac.uk, [email protected]

This research is funded by UKIERI, EU Project AXES ICT-269980 and ERC grant VisRec no. 228180.

Cats and Dogs Omkar M Parkhi1,2 Andrea Vedaldi1 Andrew Zisserman1 C. V. Jawahar2

Image Layout Image + Head Layout

Image + Head + Body Layout

35.7%39.0%77.0%81.8%69.0%71.1%60.0%64.0%51.0%46.0%70.0%82.0%52.0%4.0%62.0%33.0%38.4%20.0%29.0%43.0%80.0%70.0%51.0%82.0%75.8%53.0%39.0%82.0%28.0%85.0%59.0%91.0%66.7%57.0%37.1%53.0%50.0%

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637

Abyssinian 1Bengal 2Birman 3Bombay 4

British Shorthair 5Egyptian Mau 6

Maine Coon 7Persian 8Ragdoll 9

Russian Blue 10Siamese 11Sphynx 12

Am. Bulldog 13Am. Pit Bull Terrier 14

Basset Hound 15Beagle 16Boxer 17

Chihuahua 18Eng. Cocker Spaniel 19

Eng. Setter 20German Shorthaired 21

Great Pyrenees 22Havanese 23

Japanese Chin 24Keeshond 25

Leonberger 26Miniature Pinscher 27

Newfoundland 28Pomeranian 29

Pug 30Saint Bernard 31

Samoyed 32Scottish Terrier 33

Shiba Inu 34Staff. Bull Terrier 35

Wheaten Terrier 36Yorkshire Terrier 37

Confusion matrix for 37 Class classification problem. (Image+Head+Body Layout with Flat classification method)

Segmentation: Qualitative Results (Oxford IIIT Pet Dataset)

Comparison with other Datasets

Dataset Examples

Layout Multi Class Classification Accuracy

Cats Vs. Dogs

Cats (25)

Dogs (12)

Combined (37)

Image 82.56% 52.01% 40.59% 39.64% Image+Head 85.06% 60.37% 52.10% 51.23%

Image+Head+Body 87.78% 64.27% 54.31% 54.05% Image+Head+Body

(Ground Truth) 88.68% 66.12% 57.29% 56.60%

Layout Classification Accuracy

Cats Vs. Dogs

Hierarchical (37)

Flat (37)

Image 94.88% 42.29% 43.30% Image+Head 95.07% 52.78% 54.03%

Image+Head+Body 94.89% 55.26% 56.68%

Image+Head+Body (Ground Truth) 95.37% 57.77% 59.21%

•  Asirra (Animal Species Image Recognition for Restricting Access) •  Introduced by Microsoft Research to provide alternatives to text

based CAPTCHA •  3 million pictures of cats and dogs from Petfinder.com •  Test: given a number of such images, separate cats from dogs

•  25,000 images are available to evaluate the system

Method Class. Accuracy

UCSD – Caltech Birds 6.91% OXFORD-IIIT Pet Dataset 38.45%

Oxford Flowers 102 53.71%

•  Multi-class classification framework from software package VLFeat (www.vlfeat.org) evaluated on 3 different datasets.

•  SIFT-BoW spatial histograms features with kernel approximations and linear SVM in 1 Vs. All classification setting

ASIRRA Challenge III.

Combining Models: II.2c

Spatial Histogram Layouts

•  Introducing new annotated dataset covering 37 different breeds of cats and dogs •  Fine grained categorization of cats and dogs •  State of the art results on MSR ASIRRA challenge

•  7,349 images of Cats and Dogs •  Collected from various sources on the Internet •  37 different categories: 25 cat breeds and 12 dog breeds •  Approx. 200 images/category with manual annotations •  Annotations for an image include:

•  Species (cat or dog) •  Breed •  Tight bounding box around the pet head •  Pixel level foreground/background masks (Trimaps)

Method Classification

Accuracy Break-in

Probability [Golle et. al] 82.7% 9.2%

This paper (Shape Only) 92.7% 42%

Method Segmentation Accuracy

All Foreground 45% Parkhi et. al (ICCV 2011) 61% This paper 65%

Failure Cases: Top row: Bengal cats (right) classified as Egyptian Mau (left) Bottom row: English Setter (right) classified as English Cocker Spaniel (left)

Segmentation: Quantitative Results (Oxford IIIT Pet Dataset)

I.

Problem and Contributions

Cat Bengal

Dog Pug

Example Annotations

Previously on Cats and Dogs..

•  Our previous work [Parkhi et al. ICCV 2011] investigated the problem of detecting deformable animals.

•  Central idea was to detect a stable, distinctive part of the animal and localize the body using the clues from that part.

•  Deformable parts model was used to detect the distinctive part and GrabCut segmentation was used to localize the object.

•  In this work, we release a dataset helpful for evaluating performance of such methods and tackle the problem of multiclass classification.

The Truth About Cats and Dogs, ICCV 2011

Model for Pet Classification II.

Dataset can be downloaded from: http://www.robots.ox.ac.uk/~vgg/data/pets.html

The Oxford-IIIT Pet Dataset I.

•  The breed of a pet affects its shape, size, fur type and color •  These attributes are modeled by combinations of shape and appearance features

•  The heads of the pets captured by deformable part models •  Constellation of HOG + LBP parts •  Two head models for cats and dogs trained separately •  Detection scores used for species classification •  Cat Vs Dog classification accuracy of 94.21% achieved

The texture of the fur is captured by a bag of words model: •  Multi-scale dense SIFT features •  Vocabulary of 4000 visual words using K-Means •  Spatial histograms with varied layouts •  Features computed on entire image as well as body parts of the animal obtained by automatic segmentation

•  The pet body (foreground) is segmented using Grabcut •  Grabcut initialized from superpixels of an image obtained from Berkeley UCM •  Super pixels seed GMMs depending upon classification scores [Chai et al. ICCV’11] •  SIFT-BoW, size and location of a superpixel used as features •  Head detection also assists GMM seeding [Parkhi et. al ICCV’11] •  Berkeley Edge Detector response provides pairwise potentials

Shape Model: II.1

Appearance Model: II.2

Automatic Segmentation: II.2a

Spatial Layouts: II.2b

cats and dogs › ~vgg › publications › ... › poster.pdf · • collected from various...

Documents