讲解人 : 崔 振 2010.9.17 supervised translation-invariant supervised translation-invariant...

39
讲讲讲 : 讲 讲 2010.9.17 Supervised Translation- Supervised Translation- Invariant Invariant Sparse Coding Sparse Coding [Jianchao Yang, Kai Yu, Thomas Huang]

Upload: claud-stewart

Post on 31-Dec-2015

346 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

讲解人 : 崔 振2010.9.17

Supervised Translation-Supervised Translation-InvariantInvariant

Sparse CodingSparse Coding

[Jianchao Yang, Kai Yu, Thomas Huang]

Page 2: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

提纲

•作者信息•文章信息•拟解决的问题•本文的方法•实验•结论

Page 3: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

提纲

•作者信息•文章信息•拟解决的问题•本文的方法•实验•结论

Page 4: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Jianchao Yang

jyang29 @ifp.uiuc.edu

Image Formation & Processsing Group (IFP), University of Illinois at Urbana-Champaign (UIUC)

Ph.D. Candidate (06-Present, ECE, UIUC) ; Ph.D. Adviser: Prof. Thomas S. Huang

B.Eng (02-06, EEIS, USTC)

Publication(第一作者) CVPR : 4篇, 2 篇 oral TIP : 2篇 ECCV10 , 1篇 ICIP,1篇

Homepage: http://www.ifp.illinois.edu/~jyang29/

Page 5: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Kai Yu

Machine Learning researcher and the Head of Media Analytics Department at NEC Laboratories America. Inc..

Ph.D. Computer Science, University of Munich,Germany, January 2001 – July 2004.

B.Sc and M.Sc, Nanjing University.

Research Interests Areas: machine learning, data mining, information

retrieval, computer vision CVPR(4),ECCV(4+),ICML(8+),NIPS(10+),…

http://www.dbs.informatik.uni-muenchen.de/~yu_k/

Page 6: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Thomas Huang

Beckman Institute Image Formation and Processing and Artificial Intelligence groups.

William L. Everitt Distinguished Professor in the U of I Department of Electrical and Computer Engineering and the Coordinated Science Lab (CSL);

Sc.D. from MIT in 1963

computer vision, image compression and enhancement, pattern recognition, and multimodal signal processing.

http://www.beckman.illinois.edu/directory/t-huang1

Page 7: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

提纲

•作者信息•文章信息•拟解决的问题•本文的方法•实验•结论

Page 8: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

文章信息

文章出处 CVPR10 ( oral)

相关文章 Yang et al. Linear spatial pyramid matching using

sparse coding for image classification. CVPR’09.

Page 9: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Abstract

In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via back-projection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutional Neural Network (CNN). Experiments show that our supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases. Further more, our supervised model targets learning linear features, implying its great potential in handling large scale datasets in real applications.

Page 10: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

摘要

针对分类任务,提出了一种新颖的基于局部图像描述子的监督分级稀疏编码模型。

通过 back-projection方法,以最小化在图像层级特征(image level features)的分类误差训练监督词典。其中图像层级特征是以空间金字塔为结构max pooling稀疏编码。在多种空间尺度下max pooling方法具有平移不变的特性,如同 CNN(Convolutional Neural Network)一样。

实验证明,与无监督词典相比,监督词典明显地改善了模型的性能,并且在多个图像数据库拥有最好的表现。

另外,监督模型目标是学习线性特征,它蕴含了一个巨大潜能 -实时地处理大规模数据库。

Page 11: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

提纲

•作者信息•文章信息•拟解决的问题•本文的方法•实验•结论

Page 12: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

拟解决的问题

Image classification To find a generic feature representation Interested in linear prediction model

Page 13: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Sparse Coding for Image Classification

Sparse Coding Unsupervised Supervised

Sparse coding on holistic image

-Linear model assumption

-Sensitive to image misalignment

-Limited applications

D. Bradley et al. ‘08

J. Wright et al. ’09

A. Wagner et al.’09

etc

D. Bradley et al. ‘08

J. Marialet al. ’08

Q. Zhang. CVPR10

etc

Sparse coding on local descriptors

-Break linear model assumption for the image space

-Robust to image misalignment

-Applicableto generic image

classification

R. Rainaet al. ’07

J. Yang et al. ’09

J. Yang et al. ’10

etc

Page 14: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

提纲

•作者信息•文章信息•拟解决的问题•本文的方法•实验•结论

Page 15: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

本文的方法框架相关知识本文模型求解方法

Page 16: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

框架

Bag of coordinatedLocal descriptors

High-dimensionalsparse codes

Imagerepresentation

It must be a cool Cat!

Descriptor extraction

nonlinear coding

feature pooling

classification

J. Yang et al. Linear spatial pyramid matching using sparse coding for image classification. CVPR’09.

Yang. CVPR09

Page 17: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

已有方法

Histogram-based SPM feature Step 1: local descriptor extraction Step 2: vector quantization (e.g.k-means) Step 3: hierarchical average pooling Step 4: nonlinear SVM

The framework of ScSPM ( CVPR09) Step 1: local descriptor extraction Step 2: sparse coding (无监督词典 ) Step 3: hierarchical max pooling Step 4: linear SVM

Page 18: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

相关知识 (1)

Sparse coding

Max pooling

Xnxm=(X1,X2,…,Xm)

Bnxk:词典

Zkxm:稀疏系数

Page 19: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

相关知识 (2)

分级融合

S: 尺度(层次)

U: 串接

Page 20: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Model ( 1)

多层max pooling

+ SVM

目标函数

Xk:表示第 k个图像

Page 21: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

监督

Model ( 2 ) -目标函数

Optimization over B: back propagation!

Page 22: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

求解方法( 1)

Squared hinge loss function

Linear prediction model

Only cares about the pooled maximum values

No analytical link

Page 23: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

求解方法( 2)

Solution: use implicit differentiation

D. M. Bradley et al. Differentiable sparse coding. NIPS 2008.

Setting the gradients at zero coefficients to be zero,

a lot of computations can be saved!

Page 24: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Training convergence

Initialization is important: B is trained in unsupervised manner.

Convergence

Page 25: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Example dictionary

Example dictionary: CMU PIE

Unsupervised Supervised

Page 26: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

提纲

•作者信息•文章信息•拟解决的问题•本文的方法•实验•结论

Page 27: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment

Classification tasks Face recognition: CMU PIE, and CMU Multi-PIE Handwritten digit recognition: MNIST Gender Recognition: FRGC 2.0

Image local descriptors: raw image patches Prediction model: one-vs-all linear SVM with squared hinge loss

function. Stochastic optimization: typically converges in 10 iterations,

gradient descent.

Page 28: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment

Parameter settings

学习率:

Page 29: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment –Face Recognition (1)

CMU PIE: 41368 images of 68 people, each under 13 poses, 43

different illumination conditions with 4 different expressions.

A subset of five near frontal views are used including all expressions and illuminations.

Page 30: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment –Face Recognition (1)

USC: unsupervised sparse coding model. SSC: supervised sparse coding model. Improvements: shows the improvements of SSC over

USC.

Classification error(%) on CMU PIE

Page 31: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment –Face Recognition (2)

CMU Multi-PIE: contains 337 subjects across simultaneous variations

in pose, expression and illumination. A subset containing near frontal view face images are

used as training and testing.

Page 32: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment –Face Recognition (2)

[SR] A. Wagner et al. Towards a practical face recognition system: robust registration and illumination by sparse representation. CVPR’09.

Face recognition error(%) on Multi-PIE

Page 33: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment – Handwritten Digit Recognition

MNIST: consists of 70,000 handwritten digits, aligned to the center. 60,000 of them are modeled as training, and the rest 10,000 as testing.

Page 34: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment – Gender Recognition

FRGC 2.0 contains 568 individuals, totally 14714 face images

under various lighting conditions and backgrounds. 11700 face images of 451 individuals are used as

training, and the remaining 3014 images of 114 persons are used as testing.

Page 35: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Experiment – Gender Recognition

Page 36: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

提纲

•作者信息•文章信息•拟解决的问题•本文的方法•实验•结论

Page 37: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

Conclusion

A supervised translation-invariant sparse coding model for image classification A generic image representation. The max pooling feature is translation-invariant. Sparse coding on local descriptors is promising compared to

sparse coding on holistic image. Supervised sparse coding improves the performance

significantly. Next steps:

Connections with hierarchical models in deep belief networks should be investigated.

More theoretical analysis for pooling functions are needed. Deep hierarchical models based on sparse coding should be

studied.

Page 38: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]
Page 39: 讲解人 : 崔 振 2010.9.17 Supervised Translation-Invariant Supervised Translation-Invariant Sparse Coding [ Jianchao Yang, Kai Yu, Thomas Huang ]

参考文献

Jianchao Yang, Kai Yu, Thomas Huang,Supervised Translation-Invariant Sparse Coding. CVPR10.

J. Yang et al. Translation-Invariant Sparse Coding. CVPR10(talk). J. Yang et al. Linear spatial pyramid matching using sparse coding

for image classification. CVPR’09.