practical deep neural networks - dgyrt.dgyblog.com/res/dlworkshop/introduction.pdf · practical...
Post on 08-Aug-2018
252 Views
Preview:
TRANSCRIPT
Practical Deep Neural NetworksGPU computing perspective
Introduction
Yuhuang Hu Chu Kiong Loo
Advanced Robotic LabDepartment of Artificial IntelligenceFaculty of Computer Science & IT
University of Malaya
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 1 / 30
Outline
1 Introduction
2 Linear Algebra
3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 2 / 30
Introduction
Outline
1 Introduction
2 Linear Algebra
3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 3 / 30
Introduction
Objectives
â Light introduction of numerical computation.
â Fundamentals of Machine Learning.
â Softmax Regression.
â Feed-forward Neural Network.
â Convolutional Networks.
â Recurrent Neural Networks.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 4 / 30
Introduction
Prerequisites
? Basic training in Calculus
? Basic training in Linear Algebra
Matrix operationsMatrix properties: transform, rank, norm, determinant, etcEigendecomposition, Singular Value Decomposition.
? Basic programming skills
If-else conditioningLoopsFunction, class, librarySource code control: Git (optional)
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 5 / 30
Introduction
References
© Deep Learning: An MIT Press book in preparationMain reference in this workshop, still in development, awesome structure,
awesome contents.
© Machine Learning: A probabilistic perspectiveOne of the best Machine Learning books on the market.
© CS231n Convolutional Neural Networks for Visual Recognition NotesNice structured, well written, loads of pictures.
© CS229 Machine Learning Course MaterialsFor basic knowledge, well written, easy-to-read.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 6 / 30
Introduction
Software and Tools
? Ubuntu 14.04
? CUDA Toolkit 7
? Python 2.7.9 (Why not 3.*?)
? Theano
? numpy, scipy, etc
? Eclipse+PyDev
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 7 / 30
Introduction
Reading List
A reading list is prepared for this workshop, all reading materials can befound at:http://rt.dgyblog.com/ref/ref-learning-deep-learning.html
The list keeps expanding!!
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 8 / 30
Linear Algebra
Outline
1 Introduction
2 Linear Algebra
3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 9 / 30
Linear Algebra
Scalar, Vector, Matrix and Tensor
Scalars A scalar is a single number.
Vectors A vector is an array of numbers.
Matrices A matrix is a 2-D array of numbers.
Tensors A tensor is an array of numbers arranged on a regular gridwith a variable number of axes.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 10 / 30
Linear Algebra
Matrix Operations and Properties
Identity In ∈ Rn×n, IA = AI = A.
Transpose (A>)ij = Aji.
Addition C = A+B where Cij = Aij +Bij .
Scalar D = a ·B + c where Dij = a ·Bij + c.
Multiply C = AB where Cij =∑
k AikBkj .
Element-wise Product C = A�B where Cij = AijBij .
Distributive A(B +C) = AB +AC.
Associative A(BC) = (AB)C.
Transpose of Matrix Product (AB)> = B>A>.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 11 / 30
Linear Algebra
Inverse Matrices
The matrix inverse of A is denoted as A−1, and it is defined as the matrixsuch that
A−1A = AA−1 = In
if A is not singular matrix.If A is not square or is square but singular, it’s still possible to find it’sgeneralized inverse or pseudo-inverse.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 12 / 30
Linear Algebra
Norms
We usually measure the size of vectors using an Lp norm:
‖x‖p =
(∑i
|xi|p) 1
p
There are few commonly used norms
L1 norm‖x‖1 =
∑i
|xi|.
l∞ (Max norm)‖x‖∞ = max
i|xi|.
Frobenius norm: Measure the size of matrix, analogy to the L2 norm.
‖A‖F =
√∑i,j
A2ij
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 13 / 30
Linear Algebra
Unit vector, Orthogonal, Orthonormal, Orthogonal Matrix
A unit vector is a vector with unit norm:
‖x‖2 = 1
A vector x and y are orthogonal to each other if x>y = 0. In Rn, at most nvectors may be mutually orthogonal with nonzero norm. If vectors are not onlyorthogonal but also have unit norm, we call them orthonormal.An orthogonal matrix is a square matrix whose rows are mutually orthonormaland whose columns are mutually orthonormal:
A>A = AA> = I
This implies thatA−1 = A>
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 14 / 30
Linear Algebra
Eigendecomposition
An eigenvector of a square matrix A is a non-zero vector v such that
Av = λv
where scalar λ is known as the eigenvalue corresponding to the eigenvector. If vis an eigenvector of A, so is any rescaled vector sv for s ∈ R, s 6= 0. Therefore,we usually only look for unit eigenvectors.We can represent the matrix A using an eigendecomposition.
A = V diag(λ)V −1
where V = [v(1),v(2), . . . ,v(n)] (one column per eigenvector) andλ = [λ1, . . . , λn].Every real symmetric matrix can be decomposed into
A = QΛQ>
where Q is an orthogonal matrix composed of eigenvectors of A, and Λ is adiagonal matrix, with λii being the eigenvalues.Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 15 / 30
Linear Algebra
Singular Value Decomposition (SVD)
SVD is a general purpose matrix factorization method that decompose an n×mmatrix X into:
X = UΣW>
where U is an n× n matrix whose columns are mutually orthonormal, Σ is aretangular diagonal matrix and W is an m×m matrix whose columns aremutually orthonormal. Elements alone the main diagonal of Σ are referred to asthe singular values. Columns of U and W are referred as the left-singular vectorsand right-singular vectors of X respectively.
Left-singular vectors of X are the eigenvectors of XX>.
Right-singular vectors of X are the eigenvectors of X>X.
Non-zero singular values of X are the square roots of the non-zeroeigenvalues for both XX> and X>X.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 16 / 30
Linear Algebra
Trace, Determinant
The trace operator is defined as
Tr(A) =∑i
Aii
Frobenius norm in trace operator:
‖A‖F =√
Tr(A>A)
The determinant of a square matrix, denoted det(A) is a functionmapping matrices to real scalars. The determinant is equal to the productof all matrix’s eigenvalues.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 17 / 30
Dataset Preparation
Outline
1 Introduction
2 Linear Algebra
3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 18 / 30
Dataset Preparation
MNIST
MNIST dataset contains 60,000 training samples and 10,000 testingsamples (Lecun, Bottou, Bengio, & Haffner, 1998). Each sample is ahand-written digit from 0-9.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 19 / 30
Dataset Preparation
CIFAR-10
CIFAR-10 dataset consists 60,000 images in 10 classes (Krizhevsky, 2009).There are 50,000 training images and 10,000 testing images.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 20 / 30
Dataset Preparation
Dataset: How to split the data?
, If dataset has been splitted initially, please keep that way.
, If dataset comes as a whole, usually use 60% as trianing data, 20% ascross-validation and 20% as testing data.
, Another alternative is using 70% as training data, 30% as testingdata.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 21 / 30
Dataset Preparation
Dataset: Cross validation
v Cross validation is used to keep track the performance of learningalgorithm (overfitting? underfitting?).
v Used to be a small portion of training dataset, randomly selected.
v Serve as testing data when testing data is not visible.
v Help to choose and adjust parameters of the learning model.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 22 / 30
Dataset Preparation Data Standardization
Mean subtraction, Unit Variance
Let µ as mean image
µ =1
m
m∑i=1
x(i)
And then subtract mean image on every image:
x(i) = x(i) − µ
Let
σ2j =1
m
∑i
(x(i)j
)2and then
x(i)j =
x(i)j
σj
This is also called PCA whitening if it’s applied to rotated data, thevariance here can also be viewed as eigenvalues.Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 23 / 30
Dataset Preparation Principle Components Analysis
PCA
Principle Components Analysis (PCA) is a dimension reduction algorithmthat can be used to significantly speed up unsupervised feature learningalgorithm. PCA will find a lower-dimensional subspace onto which toproject our data.Given a set of data X = {x(1), x(2), . . . , x(m)}, where x(i) ∈ Rn,covariance is computed by
Σ =XX>
m
Use eigendecomposition, we can obtain the eigenvectors U :
U =
| | · · · |u1 u2 · · · un| | · · · |
and corresponding eigenvalues Λ = {λ1, . . . , λn}. Note that usually Λ is asorted in descending order.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 24 / 30
Dataset Preparation Principle Components Analysis
PCA
Columns in U are principle bases, so that we can rotate our data to thenew bases:
Xrot = UTX;
Noted that X = UXrot.To one of the data x, we can reduce the dimension by
x̃ =
xrot,1...
xrot,k
0...0
≈
xrot,1...
xrot,k
xrot,k+1...
xrot,n
= xrot
The resulted X̃ can approximate the distribution of Xrot since it’sdropping only small components. And we reduced n− k dimensions of theoriginal dataYuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 25 / 30
Dataset Preparation Principle Components Analysis
PCA
We can recover the approximation of original signal X̂ from X̃:
x̂ = U
x̃1...x̃k0...0
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 26 / 30
Dataset Preparation Principle Components Analysis
PCA: Number of components to retain
To determine k, we usually look at the percentage of variance retained,
p =
∑ki=1 λj∑nj=1 λj
(1)
One common heuristic is to choose k so as to retain 99% of the variance(p ≥ 0.99).
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 27 / 30
Dataset Preparation ZCA Whitening
ZCA Whitening
ZCA whitening is just a recover version of PCA whitening:
xZCA = UxPCA = Uxrot√Λ + ε
where ε is a regularization term. When x takes values around [−1, 1], avalue of ε ≈ 10−5 might be typical.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 28 / 30
Q&A
Q&A
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 29 / 30
Q&A
References
Krizhevsky, A. (2009). Learning multiple layers of features from tinyimages (Master Thesis).
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998, Nov).Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11), 2278-2324.
Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 30 / 30
top related