fundamentals of deep (artificial) neural networks …hy590-21/pdfs/fundamentals_of...transferable...
TRANSCRIPT
Fundamentals of Deep (Artificial) Neural Networks (DNN)Greg Tsagkatakis
CSD - UOC
ICS - FORTH
The Big Data era
FUNDAMENTALS OF DEEP NEURAL NETWORKS 2
Brief history of DL
3FUNDAMENTALS OF DEEP NEURAL NETWORKS
Why Today?Lots of Data
4FUNDAMENTALS OF DEEP NEURAL NETWORKS
Why Today?Lots of Data
Deeper Learning
5FUNDAMENTALS OF DEEP NEURAL NETWORKS
Why Today?Lots of Data
Deep Learning
More Power
https://blogs.nvidia.com/blog/2016/01/12/accelerating-ai-artificial-intelligence-gpus/https://www.slothparadise.com/what-is-cloud-computing/
6FUNDAMENTALS OF DEEP NEURAL NETWORKS
Machine learning is everywhere
FUNDAMENTALS OF DEEP NEURAL NETWORKS 7
Deep LearningInspiration from brain
86Billion neurons
1014-1015 synapses
FUNDAMENTALS OF DEEP NEURAL NETWORKS 8
Key components of ANN Architecture (input/hidden/output layers)
9FUNDAMENTALS OF DEEP NEURAL NETWORKS
Key components of ANN Architecture (input/hidden/output layers)
Weights
10FUNDAMENTALS OF DEEP NEURAL NETWORKS
Key components of ANN Architecture (input/hidden/output layers)
Weights
Activations
11
LINEAR RECTIFIED
LINEAR (ReLU)
LOGISTIC /
SIGMOIDAL / TANH
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Perceptron: an early attempt
Activation function
Need to tune and
σ
x1
x2
…
b
w1
w2
1
12FUNDAMENTALS OF DEEP NEURAL NETWORKS
Multilayer perceptron
w1
w2
w3
A
B
C
D
E
A neuron is of the form
σ(w.x + b) where σ is
an activation function
We just added a neuron layer!
We just introduced non-linearity!
w1A
w2B
w1D
wAE
wDE
13FUNDAMENTALS OF DEEP NEURAL NETWORKS
14
Sasen Cain (@spectralradius)
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Training & TestingTraining: determine weights◦ Supervised: labeled training examples
◦ Unsupervised: no labels available
◦ Reinforcement: examples associated with rewards
Testing (Inference): apply weights to new examples
15FUNDAMENTALS OF DEEP NEURAL NETWORKS
Training DNN1. Get batch of data
2. Forward through the network -> estimate loss
3. Backpropagate error
4. Update weights based on gradient
Errors
16FUNDAMENTALS OF DEEP NEURAL NETWORKS
BackpropagationChain Rule in Gradient Descent: Invented in 1969 by Bryson and Ho
Defining a loss/cost function
Assume a function
and associated predictions
Examples of Loss function
•Hinge
•Exponential
•Logistic
17FUNDAMENTALS OF DEEP NEURAL NETWORKS
Gradient DescentMinimize function J w.r.t. parameters θ
Gradient
Stochastic Gradient Descend
18
New weights Gradient
Old weights Learning rate
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Backpropagation
19
Chain Rule:
Given:
…
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Backpropagation
Chain rule:
◦ Single variable
◦ Multiple variables
20
g fxy=g(x)
z=f(y)=f(g(x))
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Online exampleshttps://google-developers.appspot.com/machine-learning/crash-course/backprop-scroll/
21FUNDAMENTALS OF DEEP NEURAL NETWORKS
Optimization algorithms
22FUNDAMENTALS OF DEEP NEURAL NETWORKS
Visualization
23FUNDAMENTALS OF DEEP NEURAL NETWORKS
Training Characteristics
24
Under-fitting
Over-fitting
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Supervised Learning
25FUNDAMENTALS OF DEEP NEURAL NETWORKS
Supervised Learning
Exploiting prior knowledge
Expert users
Crowdsourcing
Other instruments
Spiral
Elliptical
?
26
ModelPrediction
Data Labels
FUNDAMENTALS OF DEEP NEURAL NETWORKS
State-of-the-art (before Deep Learning)Support Vector Machines Binary classification
27FUNDAMENTALS OF DEEP NEURAL NETWORKS
State-of-the-art (before Deep Learning)Support Vector Machines Binary classification
Kernels <-> non-linearities
28FUNDAMENTALS OF DEEP NEURAL NETWORKS
State-of-the-art (before Deep Learning)Support Vector Machines Binary classification
Kernels <-> non-linearities
Random Forests Multi-class classification
29FUNDAMENTALS OF DEEP NEURAL NETWORKS
State-of-the-art (before Deep Learning)Support Vector Machines Binary classification
Kernels <-> non-linearities
Random Forests Multi-class classification
Markov Chains/Fields Temporal data
30FUNDAMENTALS OF DEEP NEURAL NETWORKS
State-of-the-art (since 2015)Deep Learning (DL)
Convolutional Neural Networks (CNN) <-> Images
Recurrent Neural Networks (RNN) <-> Audio
31FUNDAMENTALS OF DEEP NEURAL NETWORKS
Convolutional Neural Networks
(Convolution + Subsampling) + () … + Fully Connected
32FUNDAMENTALS OF DEEP NEURAL NETWORKS
Convolution operator
33FUNDAMENTALS OF DEEP NEURAL NETWORKS
Convolutional Layers
channels
hei
ght
32x32x1 Image
5x5x1 filter
hei
ght
28x28xK activation map
K filters
34FUNDAMENTALS OF DEEP NEURAL NETWORKS
Convolutional LayersCharacteristics
Hierarchical features
Location invariance
Parameters
Number of filters (32,64…)
Filter size (3x3, 5x5)
Stride (1)
Padding (2,4)
“Machine Learning and AI for Brain Simulations” –Andrew Ng Talk, UCLA, 2012
35FUNDAMENTALS OF DEEP NEURAL NETWORKS
Activation LayerIntroduction of non-linearity◦ Brain: thresholding -> spike trains
36FUNDAMENTALS OF DEEP NEURAL NETWORKS
Activation LayerReLU: x=max(0,x)
Simplifies backprop
Makes learning faster
Avoids saturation issues
~ non-negativity constraint
(Note: The brain)
No saturated gradients
37FUNDAMENTALS OF DEEP NEURAL NETWORKS
Subsampling (pooling) Layers
38
<-> downsampling
Scale invariance
Parameters
• Type
• Filter Size
• Stride
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Fully Connected LayersFull connections to all activations in previous layer
Typically at the end
Can be replaced by conv
39
Feat
ure
s
Cla
sses
FUNDAMENTALS OF DEEP NEURAL NETWORKS
LeNet [1998]
40FUNDAMENTALS OF DEEP NEURAL NETWORKS
AlexNet [2012]
Alex Krizhevsky, Ilya Sutskever and Geoff Hinton, ImageNet ILSVRC challenge in 2012http://vision03.csail.mit.edu/cnn_art/data/single_layer.png
41FUNDAMENTALS OF DEEP NEURAL NETWORKS
K. Simonyan, A. Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv technical report, 2014
VGGnet [2014]
42FUNDAMENTALS OF DEEP NEURAL NETWORKS
VGGnet
D: VGG16E: VGG19All filters are 3x3
More layers smaller filters
43FUNDAMENTALS OF DEEP NEURAL NETWORKS
Inception (GoogLeNet, 2014)
Inception moduleInception module with dimensionality reduction
44FUNDAMENTALS OF DEEP NEURAL NETWORKS
Residuals
45FUNDAMENTALS OF DEEP NEURAL NETWORKS
ResNet, 2015
46
He, Kaiming, et al. "Deep residual learning for image recognition." IEEE CVPR. 2016.
FUNDAMENTALS OF DEEP NEURAL NETWORKS
DenseNet
47
Densely Connected Convolutional Networks, 2016
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Training protocolsFully Supervised• Random initialization of weights
• Train in supervised mode (example + label)
Unsupervised pre-training + standard classifier• Train each layer unsupervised
• Train a supervised classifier (SVM) on top
Unsupervised pre-training + supervised fine-tuning• Train each layer unsupervised
• Add a supervised layer
48FUNDAMENTALS OF DEEP NEURAL NETWORKS
Dropout
49
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research15.1 (2014): 1929-1958.
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Batch Normalization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [Ioffe and Szegedy 2015] 50FUNDAMENTALS OF DEEP NEURAL NETWORKS
Transfer LearningLayer 1 Layer 2 Layer L
1x
2x
……
……
……
……
……
elephant
……Nx
Pixels
……
……
……
FUNDAMENTALS OF DEEP NEURAL NETWORKS 51
Transfer Learning
52
Layer 1 Layer 2 Layer L
1x
2x
……
……
……
……
……
elephant
……Nx
Pixels
……
……
……
Layer 1 Layer 2 Layer L
1x
2x
……
……
……
……
Healthy
Malignancy
Nx
Pixels
……
……
……
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Layer Transfer - Image
Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable are features in deep neural networks?”, NIPS, 2014
Only train the rest layers
fine-tune the whole network
Source: 500 classes from ImageNet
Target: another 500 classes from ImageNet
FUNDAMENTALS OF DEEP NEURAL NETWORKS 53
ImageNET
Validation classification
Validation classification
Validation classification
• ~14 million labeled images, 20k classes
• Images gathered from Internet
• Human labels via Amazon MTurk
• ImageNet Large-Scale Visual Recognition Challenge (ILSVRC): 1.2 million training images, 1000 classes
www.image-net.org/challenges/LSVRC/
54FUNDAMENTALS OF DEEP NEURAL NETWORKS
Summary: ILSVRC 2012-2015
Team Year Place Error (top-5) External data
(AlexNet, 7 layers) 2012 - 16.4% no
SuperVision 2012 1st 15.3% ImageNet 22k
Clarifai – NYU (7 layers) 2013 - 11.7% no
Clarifai 2013 1st 11.2% ImageNet 22k
VGG – Oxford (16 layers) 2014 2nd 7.32% no
GoogLeNet (19 layers) 2014 1st 6.67% no
ResNet (152 layers) 2015 1st 3.57%
Human expert* 5.1%
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
55FUNDAMENTALS OF DEEP NEURAL NETWORKS
Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks." Nature 542.7639 (2017): 115-118.
Skin cancer detection
56FUNDAMENTALS OF DEEP NEURAL NETWORKS
CNN & FMRI
57FUNDAMENTALS OF DEEP NEURAL NETWORKS
Structured deep models
58
Material from Thomas Kipf, CompBio Seminar, University of Cambridge, 5/2018
BiologicalNeural Network
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Graph CNNs
59FUNDAMENTALS OF DEEP NEURAL NETWORKS
60FUNDAMENTALS OF DEEP NEURAL NETWORKS
Graph CNN
61FUNDAMENTALS OF DEEP NEURAL NETWORKS
Different types of mapping
Image classification
Image captioning
Sentiment analysis
Machine translation
Synced sequence(video classification)
62
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Recurrent Neural NetworksMotivation
Feed forward networks accept a fixed-sized vector as input and produce a fixed-sized vector as output
fixed amount of computational steps
recurrent nets allow us to operate over sequences of vectors
Use cases
Video: sequence understanding
Audio: speech transcription
Text: natural language processing
63FUNDAMENTALS OF DEEP NEURAL NETWORKS
Recurrent neuron
64
next time step
● xt: Input at time t
● ht-1: State at time t-1
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Recurrent Neural Networks
65FUNDAMENTALS OF DEEP NEURAL NETWORKS
Unfolding RNNsEach node represents a layer of network units at a single time step.
The same weights are reused at every time step.
67FUNDAMENTALS OF DEEP NEURAL NETWORKS
Long Short-Term Memory networks (LSTMs)
68FUNDAMENTALS OF DEEP NEURAL NETWORKS
Multitask LearningThe multi-layer structure makes NN suitable for multitask learning
Input feature
Task A Task B
Task A Task B
Input feature for task A
Input feature for task B
FUNDAMENTALS OF DEEP NEURAL NETWORKS 69
app-level mobile data analysis
70FUNDAMENTALS OF DEEP NEURAL NETWORKS
Unsupervised Learning
71FUNDAMENTALS OF DEEP NEURAL NETWORKS
Why unsupervised?Challenges
• Massive volumes of observations
• No user-introduced annotations
Applications
72
Clustering Dimensionality reduction
FUNDAMENTALS OF DEEP NEURAL NETWORKS
K-means
73FUNDAMENTALS OF DEEP NEURAL NETWORKS
K-means
74FUNDAMENTALS OF DEEP NEURAL NETWORKS
TopicsSparse coding
Autoencoders
Generative Adversarial Networks
75FUNDAMENTALS OF DEEP NEURAL NETWORKS
Sparse Coding
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
0.8 * + 0.3 * + 0.5 *
2 1min s.t. x Ds s K
0.8 * + 0.3 * + 0.5 *
FUNDAMENTALS OF DEEP NEURAL NETWORKS 76
Deep ModelsApplications of SC
Shortcoming◦ Shallow model <-> single level of representation
◦ Complexity <-> dictionary size
◦ Task specific
77FUNDAMENTALS OF DEEP NEURAL NETWORKS
AutoEncoders (AE)
NNEncoder
NNDecoder
codeCompact
representation of the input object
codeCan reconstruct
the original object
Learn together28 X 28 = 784
Usually <784
FUNDAMENTALS OF DEEP NEURAL NETWORKS 78
AutoEncoders (AE)
Unsupervised feature learning
Network is trained to output the input (learn identify function).
Encoder
Decoder
79
x4
x5
x6
Input Layer Hidden
x1
x2
x3
x4
x5
x6
x1
x2
x3
+1
Output
h1
h2
h3
h4
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Regularized Autoencoders
Sparse neuron activation
Denoising auto-encoders
Convolutional AE
80FUNDAMENTALS OF DEEP NEURAL NETWORKS
Stacked AutoEncoders (SAE)
Extended AE with multiple layers of hidden units
Challenges of Backpropagation
Efficient training◦ Normalization of input
Unsupervised pre-training◦ Greedy layer-wise training
◦ Fine-tune w.r.t criterion
81
Bengio, Learning deep architectures for AI, Foundations and Trends in Machine Learning ,2009
FUNDAMENTALS OF DEEP NEURAL NETWORKS
x4
x5
x6
x1
x2
x3
x4
x5
x6
x1
x2
x3
a1
a2
a3
SAE
82
a4
FUNDAMENTALS OF DEEP NEURAL NETWORKS
x4
x5
x6
x1
x2
x3
x4
x5
x6
x1
x2
x3
a1
a2
a3
SAE
83
a4
FUNDAMENTALS OF DEEP NEURAL NETWORKS
x4
x5
x6
x1
x2
x3
a1
a2
a3
SAE
84
a4
b1
b2
b3
FUNDAMENTALS OF DEEP NEURAL NETWORKS
x4
x5
x6
x1
x2
x3
a1
a2
a3
SAE
85
a4
b1
b2
b3
a1
a2
a3
a4
FUNDAMENTALS OF DEEP NEURAL NETWORKS
x4
x5
x6
x1
x2
x3
a1
a2
a3
SAE
86
a4
b1
b2
b3
a1
a2
a3
a4
FUNDAMENTALS OF DEEP NEURAL NETWORKS
x4
x5
x6
x1
x2
x3
a1
a2
a3
SAE
87
a4
b1
b2
b3
a1
a2
a3
a4
x4
x5
x6
x1
x2
x3
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Deep Auto-encoder
Original Image
PCA
DeepAuto-encoder
78
4
78
4
78
4
10
00
50
0
25
0
30
25
0
50
0
10
00
78
4
30
FUNDAMENTALS OF DEEP NEURAL NETWORKS 88
AWGN channel modeling
89FUNDAMENTALS OF DEEP NEURAL NETWORKS
Application in modulationtraining a learned physical layer, where a channel autoencoder learns how to optimize BER for two bits of information over a simple Additive White Gaussian Noise (AWGN) effect.
90FUNDAMENTALS OF DEEP NEURAL NETWORKS
91FUNDAMENTALS OF DEEP NEURAL NETWORKS
Generators?
NNDecoder
code
FUNDAMENTALS OF DEEP NEURAL NETWORKS 92
Generative Adversarial Networks
93
Goodfellow, Ian, et al. "Generative adversarial nets." NIPS 2014
Yann LeCun,
“adversarial
training is the
coolest thing
since sliced bread.”
Co
de
Generator
Discriminator
True
Fake
FUNDAMENTALS OF DEEP NEURAL NETWORKS
GANs
94FUNDAMENTALS OF DEEP NEURAL NETWORKS
GANs
95FUNDAMENTALS OF DEEP NEURAL NETWORKS
GANs
96FUNDAMENTALS OF DEEP NEURAL NETWORKS
GANs
97FUNDAMENTALS OF DEEP NEURAL NETWORKS
Training Procedure: Basic Idea
G tries to fool D
D tries not to be fooled
Models are trained simultaneously◦ As G gets better, D has a more challenging task
◦ As D gets better, G has a more challenging task
Ultimately, we don’t care about the D◦ Its role is to force G to work harder
DiscriminativeModel
real or fake?
GenerativeModel
noise (𝒛)
Real world
FUNDAMENTALS OF DEEP NEURAL NETWORKS 98
DCGANsDeep Convolutional Generative Adversarial Networks
99
Radford et. al. Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks. ICLR 2016
FUNDAMENTALS OF DEEP NEURAL NETWORKS
100FUNDAMENTALS OF DEEP NEURAL NETWORKS
101FUNDAMENTALS OF DEEP NEURAL NETWORKS
102FUNDAMENTALS OF DEEP NEURAL NETWORKS
Image arithmetic
103FUNDAMENTALS OF DEEP NEURAL NETWORKS
104FUNDAMENTALS OF DEEP NEURAL NETWORKS
GAN based Style Transfer
105FUNDAMENTALS OF DEEP NEURAL NETWORKS
GANs for Super Resolution
106
Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." arXiv preprint arXiv:1609.04802 (2016).
FUNDAMENTALS OF DEEP NEURAL NETWORKS
GANs for Super Resolution
107FUNDAMENTALS OF DEEP NEURAL NETWORKS
Fake newshttps://youtu.be/8siezzLXbNo
https://www.youtube.com/watch?v=DIZf7eRlD4w&t=108s
108FUNDAMENTALS OF DEEP NEURAL NETWORKS
Reinforcement learningReinforcement learning: system interacts with environmentand must perform a certain goal without explicitly telling itwhether it has come close to its goal or not.
109FUNDAMENTALS OF DEEP NEURAL NETWORKS
Types of Reinforcement LearningSearch-based: evolution directly on a policy◦ E.g. genetic algorithms
Model-based: build a model of the environment◦ Then you can use dynamic programming
◦ Memory-intensive learning method
Model-free: learn a policy without any model◦ Temporal difference methods (TD)
◦ Requires limited episodic memory (though more helps)
Q-learning◦ The TD version of Value Iteration
◦ This is the most widely used RL algorithm
110FUNDAMENTALS OF DEEP NEURAL NETWORKS
Q-Learning
111
Define Q*(s,a): “Total reward if an agent in state s takes action a, then acts optimally at all subsequent time steps”
Optimal policy: π*(s)=argmaxaQ*(s,a)
Q(s,a) is an estimate of Q*(s,a)
Q-learning motion policy: π(s)=argmaxaQ(s,a)
Update Q recursively:
)','(max),('
asQrasQa
10
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Deep Q-Learning
112
Mnih et al. Human-level control through deep reinforcement learning, Nature 2015
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Deep Q learning in AtariEnd-to-end learning of Q(s,a) from pixels s
Output is Q(s,a) for 18 joystick/button configurations
Reward is change in score for that step
Q(s,a1)Q(s,a2)Q(s,a3)...........Q(s,a18)
FUNDAMENTALS OF DEEP NEURAL NETWORKS 113
Breakout demo
FUNDAMENTALS OF DEEP NEURAL NETWORKS 114
Multi-agent case
115FUNDAMENTALS OF DEEP NEURAL NETWORKS
116FUNDAMENTALS OF DEEP NEURAL NETWORKS
TensorFlowDeep learning library, open-sourced by Google (11/2015)
TensorFlow provides primitives for ◦ defining functions on tensors
◦ automatically computing their derivatives
What is a tensor
What is a computational graph
117
Material from lecture by Bharath Ramsundar, March 2018, Stanford
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Introduction to KerasOfficial high-level API of TensorFlow
◦ Python
◦ 250K developers
Same front-end <-> Different back-ends◦ TensorFlow (Google)
◦ CNTK (Microsoft)
◦ MXNet (Apache)
◦ Theano (RIP)
Hardware◦ GPU (Nvidia)
◦ CPU (Intel/AMD)
◦ TPU (Google)
Companies: Netflix, Uber, Google, Nvidia…
118
Material from lecture by Francois Chollet, 2018, Stanford
FUNDAMENTALS OF DEEP NEURAL NETWORKS
Keras modelsInstallation ◦ Anaconda -> Tensorflow -> Keras
Build-in◦ Conv1D, Conv2D, Conv3D…
◦ MaxPooling1D, MaxPooling2D, MaxPooling3D…
◦ Dense, Activation, RNN…
The Sequential Model◦ Very simple
◦ Single-input, Single-output, sequential layer stacks
The functional API◦ Mix & Match
◦ Multi-input, multi-output, arbitrary static graph topologies
119FUNDAMENTALS OF DEEP NEURAL NETWORKS
Sequential>>from keras.models import Sequential
>>model = Sequential()
>> from keras.layers import Dense
>> model.add(Dense(units=64, activation='relu', input_dim=100))
>> model.add(Dense(units=10, activation='softmax'))
>> model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
>> model.fit(x_train, y_train, epochs=5, batch_size=32)
>> loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)
>> classes = model.predict(x_test)
120FUNDAMENTALS OF DEEP NEURAL NETWORKS
Functional >> from keras.layers import Input, Dense
>> from keras.models import Model
>> inputs = Input(shape=(784,))
>> x = Dense(64, activation='relu')(inputs)
>> x = Dense(64, activation='relu')(x)
>> predictions = Dense(10, activation='softmax')(x)
>> model = Model(inputs=inputs, outputs=predictions)
>> model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
>> model.fit(data, labels)
121FUNDAMENTALS OF DEEP NEURAL NETWORKS
ReferencesStephens, Zachary D., et al. "Big data: astronomical or genomical?." PLoS biology 13.7 (2015): e1002195.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning.“ Nature 521.7553 (2015): 436-444.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
Kietzmann, Tim Christian, Patrick McClure, and NikolausKriegeskorte. "Deep Neural Networks In Computational Neuroscience. " bioRxiv (2017): 133504.
122FUNDAMENTALS OF DEEP NEURAL NETWORKS
https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.03®ularizationRate=0&noise=0&networkShape=3,2&seed=0.57693&showTestData=false&discretize=true&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=true&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false
123FUNDAMENTALS OF DEEP NEURAL NETWORKS
Model vs Data parallelism
124FUNDAMENTALS OF DEEP NEURAL NETWORKS
Parameter server approach
125FUNDAMENTALS OF DEEP NEURAL NETWORKS