angular and deep learning

Deep Learning and Angular

Angular Meetup (06/14/2017)

Google (Mountain View)

Oswald Campesato

[email protected]

The Data/AI Landscape

Gartner Hype Curve: Where is Deep Learning?

The Impact of AI

“Robot trucks will kill far fewer people (if any).

Machines don’t get distracted or look at phones

instead of the road.

Machines don’t drink alcohol, do drugs, or things that

contribute to accidents.”

Robot trucks don’t need salaries, vacations, health

insurance, rest periods, or sick time.

The only costs will be upkeep of the machinery.

AI/ML/DL: How They Differ

Traditional AI (20th century):

based on collections of rules

Led to expert systems in the 1980s

The era of LISP and Prolog


Machine Learning:

Started in the 1950s (approximate)

Alan Turing and “learning machines”

Data-driven (not rule-based)

Many types of algorithms

Involves optimization


Deep Learning:

Started in the 1950s (approximate)

The “perceptron” (basis of NNs)

Data-driven (not rule-based)

large (even massive) data sets

Involves neural networks (CNNs: ~1970s)

Lots of heuristics

Heavily based on empirical results

The Rise of Deep Learning

Massive and inexpensive computing power

Huge volumes of data/Powerful algorithms

The “big bang” in 2009:

”deep-learning neural networks and NVidia GPUs"

Google Brain used NVidia GPUs (2009)

AI/ML/DL: Commonality

All of them involve a model

A model represents a system

Goal: a good predictive model

The model is based on:

Many rules (for AI)

data and algorithms (for ML)

large sets of data (for DL)

A Basic Model in Machine Learning

Let’s perform the following steps:

1) Start with a simple model (2 variables)

2) Generalize that model (n variables)

3) See how it might apply to a NN

Linear Regression

One of the simplest models in ML

Fits a line (y = m*x + b) to data in 2D

Finds best line by minimizing MSE:

m = average of x values (“mean”)

b also has a closed form solution

Linear Regression in 2D: example

Linear Regression: alternatives

Fitting a polynomial (degree 2, 3, …)

Can lead to overfitting

Polynomials diverge faster than lines

Can reduce predictive accuracy

NB: Linear Regression != Curve Fitting

Linear Regression: example #1

One feature (independent variable):

X = number of square feet

Predicted value (dependent variable):

Y = cost of a house

A very “coarse grained” model

We can devise a much better model

Linear Regression: example #2

Multiple features:

X1 = # of square feet

X2 = # of bedrooms

X3 = # of bathrooms (dependency?)

X4 = age of house

X5 = cost of nearby houses

X6 = corner lot (or not): Boolean

a much better model (6 features)

Linear Multivariate Analysis

General form of multivariate equation:

Y = w1*x1 + w2*x2 + . . . + wn*xn + b

w1, w2, . . . , wn are numeric values

x1, x2, . . . , xn are variables (features)

Properties of variables:

Can be independent (Naïve Bayes)

weak/strong dependencies can exist

Neural Network with 3 Hidden Layers

Neural Networks: equations

Node “values” in first hidden layer:

N1 = w11*x1+w21*x2+…+wn1*xn

N2 = w12*x1+w22*x2+…+wn2*xn

N3 = w13*x1+w23*x2+…+wn3*xn

. . .

Nn = w1n*x1+w2n*x2+…+wnn*xn

Similar equations for other pairs of layers

Neural Networks: Matrices

From inputs to first hidden layer:

Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix)

From first to second hidden layers:


From second to third hidden layers:


Apply an “activation function” to y values

Neural Networks (general)

Multiple hidden layers:

Layer composition is your decision

Activation functions: sigmoid, tanh, RELU

https://en.wikipedia.org/wiki/Activation_function

Back propagation (1980s)

https://en.wikipedia.org/wiki/Backpropagation

=> Initial weights: small random numbers

https://en.wikipedia.org/wiki/Activation_function

https://en.wikipedia.org/wiki/Backpropagation

Activation Functions (Examples)

import numpy as np

...

# Python sigmoid example:

z = 1/(1 + np.exp(-np.dot(W, x)))

...# Python tanh example:

z = np.tanh(np.dot(W,x));

# Python ReLU example:

z = np.maximum(0, np.dot(W, x))

What’s the “Best” Activation Function?

Initially sigmoid was popular

then tanh became popular

Now RELU is preferred (better results)

NB: sigmoid + tanh are used in LSTMs

Sample Cost Function #1

Sample Cost Function #2

How to Select a Cost Function

1) Depends on the learning type:

=> supervised/unsupervised/RL

2) Depends on the activation function

3) Other factors

Example:

cross-entropy cost function for supervised

learning on multiclass classification

GD versus SGD

SGD (Stochastic Gradient Descent):

+ involves a SUBSET of the dataset

+ aka Minibatch Stochastic Gradient Descent

GD (Gradient Descent):

+ involves the ENTIRE dataset

More details:

http://cs229.stanford.edu/notes/cs229-notes1.pdf

What are Hyper Parameters?

higher level concepts about the model such as

complexity, or capacity to learn

Cannot be learned directly from the data in the

standard model training process

must be predefined

Hyper Parameters (examples)

# of hidden layers in a neural network

the learning rate (in many models)

# of leaves or depth of a tree

# of latent factors in a matrix factorization

# of clusters in a k-means clustering

How Many Layers in a DNN?

Algorithm #1 (from Geoffrey Hinton):

1) add layers until you start overfitting your

training set

2) now add dropout or some another

regularization method

Algorithm #2 (Yoshua Bengio):

"Add layers until the test error does not improve

anymore.”

How Many Hidden Nodes in a DNN?

Based on a relationship between:

# of input and # of output nodes

Amount of training data available

Complexity of the cost function

The training algorithm

Use Cases for Neural Networks

CNNs (Convolutional NNs):

Good for image processing

2000: CNNs processed 10-20% of all checks

=> Approximately 60% of all NNs

RNNs (Recurrent NNs):

Good for NLP and audio

CNN: Sample Filters

CNN Filters (examples)

Types of RNNs

LSTMs (Long Short Term Memory)

GRUs

ResNets (Residual NNs)

Features of LSTMs

Used in Google speech recognition + Alpha Go

input/output/forget gates

they avoid the vanishing gradient problem

Can track 1000s of discrete time steps

Used by international competition winners

Often combined with CTC

Inside an LSTM

Keras/LSTM Code Snippet

import numpy

from keras.datasets import imdb

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from keras.layers.embeddings import Embedding

from keras.preprocessing import sequence

...

GANs: Generative Adversarial Networks

GANs: Generative Adversarial Networks

Make imperceptible changes to images

Can consistently defeat all NNs

Can have extremely high error rate

Some images create optical illusions

https://www.quora.com/What-are-the-pros-and-cons-of-using-generative-adversarial-networks-a-type-of-neural-network

ML/DL Frameworks

Caffe (templates instead of code)

Theano (influenced TensorFlow)

Tensorflow

TensorFlow Lite (release date?)

Keras (“layer” over Theano+TF)

Tefla (mini framework over TF)

Torch (Lua) + PyTorch (Facebook)

MxNET (Amazon)

CNTK (Microsoft)

Languages for ML/DL

Popular languages for ML:

R (popular among statisticians)

Python (sklearn/pandas/etc)

Popular languages for DL:

Python (Keras/Theano/TF modules)

some Java/C++/Go

“Challenges” in Deep Learning

overfitting/underfitting of a model

vanishing/exploding gradient

learning rate (too high or too low)

Debugging NNs (good luck)

Miscellaneous Topics

* Data versus algorithms:

Option A: good data + average algorithm

Option B: average data + good algorithm

=> Option A is preferred over Option B

• “Cleaning” a dataset:

De-duplicate and fix invalid/missing data (how?)

* Dimensionality reduction:

eliminate “unimportant” features (columns)

Miscellaneous Topics

* XOR requires two hidden layers to solve (why?)

• A dataset whose columns are interchangeable cannot be

solved with a CNN (why?)

• Second generation TPUs

• TensorFlow Lite (open source later in 2017)

www.tensorflow.org/tutorials

http://www.tensorflow.org/tutorials

D3 Fun Samples

D3 Animation effects:

MouseMoveFadeAnim1Back1.html

SVG tiger:

svg-tiger-d3.svg

D3 and SVG tiger:

svg-tiger-d3.html

Deep Learning Playground

TF playground home page:

http://playground.tensorflow.org

Demo #1:

https://github.com/tadashi-aikawa/typescript-

playground

Converts playground to TypeScript

http://playground.tensorflow.org

https://github.com/tadashi-aikawa/typescript-playground

D3/TypeScript/Deep Learning

Download playground_master.zip

npm install

npm start

Demo converts playground to TypeScript

D3/TypeScript/Deep Learning

TypeScript files in ‘src’ directory:

state.ts

seedrandom.d.ts

playground.ts

linechart.ts

heatmap.ts

dataset.ts

nn.ts (<= activations/nodes in a neural net)

Activations in TypeScript (nn.ts)

export class Activations {

public static TANH: ActivationFunction = {

output: x => (Math as any).tanh(x),

der: x => {

let output = Activations.TANH.output(x);

return 1 - output * output;

} }; public static RELU: ActivationFunction = {

output: x => Math.max(0, x), der: x => x <= 0 ? 0 : 1

};

Activations in TypeScript (nn.ts)

public static SIGMOID: ActivationFunction = {

output: x => 1 / (1 + Math.exp(-x)), der: x => {

let output = Activations.SIGMOID.output(x);

return output * (1 - output);

} }; public static LINEAR: ActivationFunction = {

output: x => x, der: x => 1

}; }

Angular/Deep Learning App (Demo #2)

Create NGDeepLearning via ‘ng’

Copy ./src/*ts files from playground_master into NGDeepLearning/src subdirectory

Merge the two package.json files

Merge the two index.html files

install d3: npm install d3 --save

Angular/Deep Learning

Add import * as d3 from 'd3’; to the files:

dataset.ts

heatmap.ts

linechart.ts

playground.ts

Launch the app: ng serve

Deep Learning and Art/”Stuff”

“Convolutional Blending” images:

=> 19-layer Convolutional Neural Network

www.deepart.io

Bots created their own language:

https://www.recode.net/2017/3/23/14962182/ai-learning-language-open-ai-research

https://www.fastcodesign.com/90124942/this-google-engineer-taught-an-algorithm-to-make-train-footage-and-its-hypnotic

http://www.deepart.io

About Me

I provide training for the following:

=> Deep Learning/TensorFlow/Keras

=> Android

=> Angular 4

Recent/Upcoming Books

1) HTML5 Canvas and CSS3 Graphics (2013)

2) jQuery, CSS3, and HTML5 for Mobile (2013)

3) HTML5 Pocket Primer (2013)

4) jQuery Pocket Primer (2013)

5) HTML5 Mobile Pocket Primer (2014)

6) D3 Pocket Primer (2015)

7) Python Pocket Primer (2015)

8) SVG Pocket Primer (2016)

9) CSS3 Pocket Primer (2016)

10) Android Pocket Primer (2017)

11) Angular Pocket Primer (2017)

angular and deep learning

Software