zhenbao liu 1, shaoguang cheng 1, shuhui bu 1, ke li 2 1 northwest polytechnical university,...

Zhenbao Liu 1 , Shaoguang Cheng 1 , Shuhui Bu 1 , Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou, China. ICME ICME 2014 – 2014 – Chengdu Chengdu , , China China (1 (1 4-18 4-18 July July , 2014) , 2014) High-Level Semantic Feature High-Level Semantic Feature for 3D Shape Based on Deep B for 3D Shape Based on Deep B elief Network elief Network

Upload: clarence-collins

Post on 13-Jan-2016

217 views

Category:

Documents

2 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

Page 1: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Zhenbao Liu1, Shaoguang Cheng1, Shuhui Bu1, Ke Li21 Northwest Polytechnical University, Xi’an, China.2 Information Engineering University, Zhengzhou, China.

ICMEICME 2014 – 2014 – ChengduChengdu, , ChinaChina(1(14-184-18 JulyJuly, 2014), 2014)

High-Level Semantic Feature for 3High-Level Semantic Feature for 3D Shape Based on Deep Belief NetD Shape Based on Deep Belief Net

workwork

Page 2: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Outline

Backgrounds

WhyIdea

WhatMethod

howExperiments

Conclusion

Page 3: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Backgrounds

Feature Representation

LearningAlgorithm

The Key step

Page 4: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

BackgroundsQ: how do we extract features in practice?

A: specified manually . Such as SIFT, HoG ...

Page 5: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Backgrounds

Page 6: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Backgrounds

NLP Speech Recgnitio

Computer Vision

Page 7: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Backgrounds Why deep learning is difficult for 3D shape (graph data)?

Page 8: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Idea – 3D feature learning framework

DeepLearning

High-level feature

3D shape

...

Page 9: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Idea – 3D feature learning framework

Off-line

On-line

low-level

feature middle-level

featurehigh-level

feature

Page 10: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Method – Low Level Feature

view images generation

Attention:•Rotation angle must be set carefully to ensure that all cameras are distributed uniformly on a sphere.•A 3D object is represented by 10× 20 images from different views.

Page 11: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

SIFT feature extraction

... ...

Robust to noise and illumination and stableto various changes of 3D viewpoints.

20 to 40 SIFT features per image. About 5000 to 7000 SIFT features for a 3D shape.

Method – Low Level Feature

Page 12: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Bag-of-Visual-Feature

Method – Middle Level Feature

SIFT feature from all shapes

K-means

SIFT feature from single

shapeNN

Encode

BoVF

Visual Words

Page 13: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Method –Deep Belief Network

restricted Bolztman Manchine

joint distribution

Energy function

Math model ：

Page 14: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Method –Deep Belief Network

Stacking a number of the RBMs and learning layer by layer from bottom to top gives rise to a DBN.

The bottom layer RBM is trained with the input data of BoVF.

BoVF

High-level feature

Classification

Page 15: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Experiments - parameters setting

Page 16: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Experiments - classification

Classification results on SHREC 2007 (left) and McGill (right)

SHREC 2007 McGill

BOVF 83% 78%

Proposed method 93% 89%

Page 17: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Experiments - retrieval

experiment on SHREC 2007

Page 18: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Experiments - retrieval

experiment on McGill

Page 19: Zhenbao Liu 1, Shaoguang Cheng 1, Shuhui Bu 1, Ke Li 2 1 Northwest Polytechnical University, Xi’an, China. 2 Information Engineering University, Zhengzhou,

Conclusion

The experiment results demonstrate that the learned high-level features are more discriminative and can achieve better performance both on classification and retrieval tasks.

The number of view images is large. Currently only investigate SIFT as the low-level descriptors.