deep learning and its applicationsto radar atrsee.xidian.edu.cn/vipsl/mla2014/chenbo.pdf · ·...

National Lab of Radar Signal Processing

Bo Chen National Lab. of Radar Signal Processing

Xidian University, China

Deep Learning and Its Applications to Radar ATR

Joint work with Bo Feng, Jun Ding, Gungor Polatkan, Guillermo Sapiro, David Blei, David B. Dunson and Lawrence Carin

MLA 2014


Outline

Deep/Multilayered Models

Deep Models with Bayesian Nonparametric

Convolutional Factor Analysis with

(Hierarchical) Beta Process

Summary and Future Work

2014-11-20

National Lab of Radar Signal Processing2014-11-20

Success Stories• Computer Vision:

− Image inpaiting/denoising, segmentation− Object recognition/detection, scene understanding− Video analysis

• Information Retrieval/NLP:− Text, audio, and image retrieval− Parsing, machine translation, text analysis

• Speech processing• Robotics• Computational Biology• Cognitive Science• Radar Automatic Target Detection and

Recognition?


Single Layer Models• Autoencoder (most deep learning methods)

− Denoising autoencoders− Restricted Boltzmann Machine− Predictive sparse decomposition

• Decoder-only− Sparse coding/factor analysis− Deconvolutional nets

• Encoder-only− (Convolutional) Neural nets (supervised)


Autoencoder

σ(Wx)g(WTz)

(Binary or Real-valued) Input x

(Binary) Features z

Encoder filters W

Sigmoid function σ(.)

Decoder filters WT

Linear or nonlinear

function g(.)


Restricted Boltzmann Machines [Hinton ’02]


σ(Wx)Dz

Input Patch x

Sparse Features z

Encoder filters W

Sigmoid function σ(.)

Decoder filters D

L1Sparsity

Predictive Sparse Decomposition[Kavukcuoglu et al.,‘09]


Denoising Autoencoder[Vincent et. Al., 2008]

Figure credit for Vincent


Deep Belief Networks (Greedy)

• Construct an RBM with an input layer v and a hidden layer h

• Stack another hidden layer on top of the RBM to form a new RBM

• And so on.

[Hinton et. al., 2006]


Deep Belief Networks (Greedy),[Hinton et. al., 2006]

Generating samples


Deep Boltzmann Machine

1 2 3 1 1 1 2 2 2 3 3, , , ; T T TE v h h h θ v W h h W h h W h

The energy of the state is defined as: 1 2 3, , ,v h h h

1 2 1 2 21| ,k ik i km mi m

p h g W v W h

v h

2 1 3 2 1 3 31| ,k ik i km mi m

p h g W h W h h h 3 2 3 21|k jk j

jp h g W h

h

1 1 11|k kj jj

p v g W h

hInference：

Figure credit for Ruslan


Outline






2014-11-20


Bayesian Nonparametric Modeling• What is Bayesian nonparametric?

− It doesnot mean “no parameters”, a really large parametric model

− “not parametric,” not restricted to objects whose dimensionality stays fixed as more data is observed. More flexible according flexible data structure

− A model over infinite dimensional function or measure spaces

− A family of distributions that is dense in some large space

• Why nonparametric models？− broad class of priors that allows data to “speak for itself”− side-step model selection and averaging


Dirichlet distribution Dirichlet process

Beta distribution Beta processGaussian distribution Gaussian processPoisson distribution Poisson process

Bayesian Nonparametric Models

Dirichlet Process / Chinese Restaurant Process Beta Process / Indian Buffet Process


Autoencoder with Nonparametric Priors• Deep Sparse Graphical Models via CIBP (R. Adam et. al., 2010)

• Autoencoder with Gaussian Process (J. Snoek et. al., 2012)


Autoencoder with Nonparametric Priors• Beta Process RBMs (R. Mittelman et. al., 2013)

• Hierarchical-Deep Models (R. Salakhutdinov et. al., 2013)


Outline






2014-11-20


Existing Convolutional Deep Networks

(Lee et al., 2009 and Norouzi et al.,2009)

• Convolutional Restricted Boltzmann Machines

• Convolutional Sparse Coding and Encoder Networks

(Kavukcuoglu et al., 2010)

with L2-norm sparsity

• Deconvolutional Networks (Zeiler et al., 2010)

Characteristics in common: 1. Require hidden nodes sparse; 2. Use point estimate to update parameters; 3. Have to set the number of filters

v

h

W

Hidden

Visible

2014-11-20


Convolutional Factor Analysis with Beta Process

Beta process prior on the usage of filters

Normal-Gamma prior (sparsity) on hidden units

Normal-Gamma prior (sparsity) on filters

Gaussian noise

Generative Model

2014-11-20


Beta-Bernoulli Process (1/2)

2014-11-20


Beta-Bernoulli Process (2/2) Beta-Bernoulli Process (N. L. Hjort, 1990 and

Thibaux & Jordan, 2007)

N

K

2014-11-20


Online Variational Bayesian Learning Variational Bayesian

Online Version

Maximize the lower bound:

The lower bound of marginal Log likelihood:

2014-11-20


Hierarchical BP

2014-11-20


Multitask Learning via the Hierarchical BP HBP Generative Process

Via this construction, each task shares the same filters, but with task-specific probability of filter usage.

Global atoms

Local order

2014-11-20


Multilayered/Deep Models Stack multiple convolutional factor analysis

and train layer by layer Max-Pooling

2014-11-20


Experiments: Synthesized Data• We generate seven binary canonical shapes, with shifted versions of these

basic shapes used to constitute five classes of example images.Image: 32x32; Class: 5; Number of images: 30.

Layer-1: 4x4; K1=10;

Layer-2: 3x3; K2=100;

Max-pooling=2;Burn-in: 30000;

Collections: 20000

Class 1Class 2

Class 3

Class 4

Class 5

2014-11-20


Experiments: MNIST (Layer-2 Filters)For each digit, we randomly select 500 samples.

Layer-1: 7x7; K1=25;

Layer-2: 3x3; K2=1000;

Max-pooling=3;Burn-in: 1000;

Collections: 500

“0”

“1”

“4”

“6”

“7”

2014-11-20


Experiments: Caltech101Layer-1: 11x11; K1=25;Layer-2: 4x4; K2=200;Layer-3: 6x6; K3=100;

Burn-in: 1000; Collections: 500

Layer-1

Layer-2

Layer-3

2014-11-20


Layer-2 and Layer-3 Filtersface chair elephant

2014-11-20


Sparseness Analysis• The impact of sparse hyperparameters on sparseness and model performance

the sparseness of filters:

2014-11-20


Sparseness Analysisthe sparseness of hidden nodes:

the sparseness of binary indicators:

2014-11-20


Layered Representation

Layer 1 Layer 2 Layer 3

2014-11-20


Online Learning

Held-out RMSE with different sizes of minibatches on Caltech101 data, as in Fig. 6. (a) Layer 1, (b) Layer 2.

2014-11-20


HBP on Caltech101Task=102; Images:1020; 1000 Layer-2 filters ranked by the global usage

It appears that as the range of image classes considered within an HBP analysis increases, the form of the prominent filters tend toward simple filter forms.

2014-11-20


Classification


Outline






2014-11-20


Summary and Future Work Build new convolutional deep networks based on factor

analysis with BP/IBP Infer the number of filters at each layer of the deep model

from the data by an IBP/BP construction Multi-task feature learning for simultaneous analysis of

different families of images via HBP Future work:

combine topic modeling with the model develop a classifier special for convolutional

property with better generalization build deep model with the encoder style


THANKS！

MLA 2014

http://web.xidian.edu.cn/bchen/en/index.html

2014-11-20

deep learning and its applicationsto radar atrsee.xidian.edu.cn/vipsl/mla2014/chenbo.pdf · ·...

Documents