supervised machine learning and learning theory · supervised machine learning and learning theory....

DS 5220

Alina OpreaAssociate Professor, CCISNortheastern University

November 13 2019

Supervised Machine Learning and Learning Theory

Logistics

• Project milestone– Due on Nov. 18 in Gradescope

• Template– Problem description– Dataset– Approach and methodology– Remaining work– Team member contribution

• Last homework will be released this week

Outline

• Feed-Forward architectures– Multi-class classification (softmax unit)– Lab in Keras

• Convolutional Neural Networks– Convolution layer– Max pooling layer– Examples of famous architectures

References

• Deep Learning books– https://www.deeplearningbook.org/– http://d2l.ai/

• Stanford notes on deep learning– http://cs229.stanford.edu/notes/cs229-notes-

deep_learning.pdf

• History of Deep Learning– https://beamandrew.github.io/deeplearning/2017

/02/23/deep_learning_101_part1.html

Neural Network Architectures

Feed-Forward Networks• Neurons from each layer

connect to neurons from next layer

Convolutional Networks• Includes convolution layer

for feature reduction• Learns hierarchical

representations

Recurrent Networks• Keep hidden state• Have cycles in

computational graph

Feed-Forward Networks

Layer 0 Layer 1 Layer 2 Layer 3

Feed-Forward Neural Network

𝑎"[$]

𝑎$[$]

𝑎&[$]

𝑎'[$]

Layer 0 Layer 1 Layer 2

𝑊[&]

𝑏[$]𝑏[&]

Training example𝑥 = (𝑥$, 𝑥&, 𝑥')

𝑎/[$]

𝑎$[&]

No cycles

𝑊[$]

𝜃 = (𝑏[$], 𝑊[$], 𝑏[&], 𝑊[&])

Vectorization

𝑧[$] = 𝑊[$]𝑥 + 𝑏[$] 𝑎[$] = 𝑔(𝑧 $ )Linear Non-Linear

Vectorization

Output layer

𝑧[&] = 𝑊[&]𝑎[$] + 𝑏[&] 𝑎[&] = 𝑔(𝑧 & )Linear Non-Linear

Hidden Units• Layer 1– First hidden unit:

• Linear: 𝑧$[$] = W$

$ 5𝑥 + b$[$]

• Non-linear: 𝑎$[$] = g(z$

[$])– …– Fourth hidden unit:

• Linear: 𝑧/[$] = W/

$ 5𝑥 + b/[$]

• Non-linear: 𝑎/[$] = g(z/

[$])• Terminology– 𝑎9

[:] - Activation of unit i in layer j– g - Activation function– 𝑊[:] - Weight vector controlling mapping from layer j-1 to j– 𝑏[:] - Bias vector from layer j-1 to j

Forward Propagation

x Prediction

Activation Functions

Binary Classification

Intermediary layers

Regression

Sigmoid Softmax

Softmax classifier

• Predict the class with highest probability• Generalization of sigmoid/logistic regression to multi-class

Multi-class classification

Review Feed-Forward Neural Networks• Simplest architecture of NN• Neurons from one layer are connected to neurons from

next layer– Input layer has feature space dimension– Output layer has number of classes – Hidden layers use linear operations, followed by non-linear

activation function– Multi-Layer Perceptron (MLP): fully connected layers

• Activation functions– Sigmoid for binary classification in last layer– Softmax for multi-class classification in last layer– ReLU for hidden layers

• Forward propagation is the computation of the network output given an input

• Back propagation is the training of a network– Determine weights and biases at every layer

FFNN Architectures

• Input and Output Layers are completely specified by the problem domain

• In the Hidden Layers, number of neurons in Layer i+1 is always smaller than number of neurons in Layer i

MNIST: Handwritten digit recognition

Predict the digitMulti-class classifier

Image Representation

• Image is 3D “tensor”: height, width, color channel (RGB)

• Black-and-white images are 2D matrices: height, width– Each value is pixel intensity

Lab – Feed Forward NN

Import modules

Load MNIST dataProcessing

Vector representation

Neural Network Architecture

10 hidden unitsReLU activation

Output LayerSoftmax activation

Feed-Forward Neural Network Architecture• 1 Hidden Layer (“Dense” or Fully Connected)• 10 neurons• Output layer uses softmax activation

Loss function Optimizer

Train and evaluate

Epoch Output

Metrics• Loss• AccuracyReported on both training and validation

Training/testing results

Changing Number of Neurons

500 hidden units

Two Layers

Output Softmax Layer

Layer 1

Layer 2

Monitor Loss

Model Parameters

Neural Network Architectures

Feed-Forward Networks• Neurons from each layer

connect to neurons from next layer

Convolutional Networks• Includes convolution layer

for feature reduction• Learns hierarchical

representations

Recurrent Networks• Keep hidden state• Have cycles in

computational graph

Convolutional Neural Networks

Convolutional Nets

- Object recognition- Steering angle prediction- Assist drivers in making

decisions

- Image captioning

Convolutional Nets

• Particular type of Feed-Forward Neural Nets – Invented by [LeCun 89]

• Applicable to data with natural grid topology– Time series– Images

• Use convolutions on at least one layer– Convolution is a linear operation that uses local

information – Also use pooling operation– Used for dimensionality reduction and learning

hierarchical feature representations

Convolutional Nets

Convolutions

Example

Input Filter Output

Acknowledgements

• Slides made using resources from:– Andrew Ng– Eric Eaton– David Sontag– Andrew Moore– Yann Lecun

• Thanks!

supervised machine learning and learning theory · supervised machine learning and learning theory....

Documents

1 comp3503 semi-supervised learning comp3503 semi-supervised...

transfer learning & semi-supervised learning

chapter 5: partially-supervised learning. cs583, bing liu,...

1 machine learning unsupervised learning. 2 supervised...

supervised learning vs. unsupervised learning

poorly supervised learning -...

semi supervised learning

self-supervised learning

supervised learning in r regression - amazon s3 · datacamp...

semi-supervised learning rong jin. semi-supervised learning ...

gradescope and teams

graph-basedsemi-supervised learningsemi-supervised learning,...

lecture 2: supervised learning | classi cationjaven/talk/l2...

supervised experiential learning

semi-supervised learning for optical flow with generative...

bits of machine learning part 1: supervised learning -...

supervised tensor learning

basic technical concepts in machine learning introduction...

nonparametric supervised learning -...

supervised learning network