introduction to machine learningyifengt/courses/machine-learning/slides/... · 2020-04-17 ·...

40
Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon University Carnegie Mellon University 1 Yifeng Tao Introduction to Machine Learning

Upload: others

Post on 27-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Introduction to Machine LearningYifeng Tao

School of Computer ScienceCarnegie Mellon University

Carnegie Mellon University 1Yifeng Tao

Introduction to Machine Learning

Page 2: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Logistics

oCourse website: http://www.cs.cmu.edu/~yifengt/courses/machine-learning

Slides uploaded after lectureoTime: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussionoContact: [email protected]

Yifeng Tao Carnegie Mellon University 2

Page 3: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

What is machine learning?

oWhat are we talking when we talk about AI and ML?

Carnegie Mellon University 3Yifeng Tao

Deeplearning

Artificialintelligence

Machinelearning

Page 4: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

What is machine learning

Yifeng Tao Carnegie Mellon University 4

Probability Statistics Calculus Linear algebra

Machine Learning

Computer vision Natural language processing

Computational Biology

Page 5: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Where are we?

oSupervised learning: linear modelsoKernel machines: SVMs and dualityoUnsupervised learning: latent space analysis and clusteringoSupervised learning: decision tree, kNN and model selectionoLearning theory: generalization and VC dimensionoNeural network (basics)oDeep learning in CV and NLPoProbabilistic graphical models

oReinforcement learning and its application in clinical text miningoAttention mechanism and transfer learning in precision medicine

Yifeng Tao Carnegie Mellon University 5

Page 6: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

What’s more after introduction?

Carnegie Mellon University 6Yifeng Tao

Machinelearning

ProbabilisticgraphicalmodelsDeeplearning

Optimization Learningtheory

Page 7: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

What’s more after introduction?

oSupervised learning: linear modelsoKernel machines: SVMs and duality oà OptimizationoUnsupervised learning: latent space analysis and clusteringoSupervised learning: decision tree, kNN and model selectionoLearning theory: generalization and VC dimensionoà Statistical machine learningoNeural network (basics)oDeep learning in CV and NLPoà Deep learningoProbabilistic graphical models

Yifeng Tao Carnegie Mellon University 7

Page 8: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Curriculum for an ML Master/Ph.D. student in CMU

o10701 Introduction to Machine Learning: ohttp://www.cs.cmu.edu/~epxing/Class/10701/

o35705 Intermediate Statistics:ohttp://www.stat.cmu.edu/~larry/=stat705/

o36708 Statistical Machine Learning: ohttp://www.stat.cmu.edu/~larry/=sml/

o10725 Convex Optimization: ohttp://www.stat.cmu.edu/~ryantibs/convexopt/

o10708 Probabilistic Graphical Models: ohttp://www.cs.cmu.edu/~epxing/Class/10708-17/

o10707 Deep Learning: ohttps://deeplearning-cmu-10707.github.io/

oBooks:oBishop. Pattern Recognition and Machine LearningoGoodfellow et al. Deep learning

Yifeng Tao Carnegie Mellon University 8

Page 9: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Neural network (basics)Yifeng Tao

School of Computer ScienceCarnegie Mellon University

Slides adapted from Eric Xing, Maria-Florina Balcan, Russ Salakhutdinov, Matt Gormley

Carnegie Mellon University 9Yifeng Tao

Introduction to Machine Learning

Page 10: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

A Recipe for Supervised Learning

o1. Given training data:

o2. Choose each of these: oDecision function

oLoss function

o3. Define goal and train with SGD:o (take small steps opposite the gradient)

Yifeng Tao Carnegie Mellon University 10

[Slide from MattGormley etal.]

Page 11: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Logistic RegressionoThe prediction rule:

oIn this case, learning P(y|x) amounts to learning conditional probability over two Gaussian distribution.

oLimitation: only simple data distribution.

Yifeng Tao Carnegie Mellon University 11

[Slide from EricXingetal.]

Page 12: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Learning highly non-linear functionsof: X à yof might be non-linear functionoX continuous or discrete varsoy continuous or discrete vars

Yifeng Tao Carnegie Mellon University 12

[Slide from EricXingetal.]

Page 13: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

From biological neuron networks to artificial neural networks

oSignals propagate through neurons in brain.oSignals propagate through perceptrons in artificial neural network.

Yifeng Tao Carnegie Mellon University 13

[Slide from EricXingetal.]

Page 14: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Perceptron Algorithm and SVM

oPerceptron: simple learning algorithm for supervised classification analyzed via geometric margins in the 50’s [Rosenblatt’57].

oSimilar to SVM, a linear classifier based on analysis of margins.oOriginally introduced in the online learning scenario.

oOnline learning modelo Its guarantees under large margins

Yifeng Tao Carnegie Mellon University 14

[Slide from Maria-FlorinaBalcan etal.]

Page 15: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

The Online Learning Algorithm

oExample arrive sequentially.oWe need to make a prediction.oAfterwards observe the outcome.

oFor i=1, 2, ..., :

oApplication:oEmail classificationoRecommendation systemsoAdd placement in a new market

Yifeng Tao Carnegie Mellon University 15

[Slide from Maria-FlorinaBalcan etal.]

Page 16: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Linear Separators: Perceptron Algorithm

oh(x) = wTx + w0, if h(x) ≥ 0, then label x as +, otherwise label it as –

oSet t=1, start with the all zero vector w1.oGiven example x, predict positive iff wt

Tx ≥ 0oOn a mistake, update as follows:

oMistake on positive, then update wt+1 ß wt + xoMistake on negative, then update wt+1 ß wt – x

oNatural greedy procedure:o If true label of x is +1 and wt incorrect on x we

have wtTx < 0, wt+1Tx ß wtTx + xTx = wtTx + ||x||2, so more chance wt+1 classifies x correctly.

oSimilarly for mistakes on negative examples.

Yifeng Tao Carnegie Mellon University 16

[Slide from Maria-FlorinaBalcan etal.]

Page 17: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Perceptron: Example and Guarantee

oExample:

oGuarantee: If data has margin 𝛾 and all points inside a ball of radius 𝑅, then Perceptron makes ≤ (𝑅/𝛾)2 mistakes.oNormalized margin: multiplying all points

by 100, or dividing all points by 100, doesn’t change the number of mistakes; algo is invariant to scaling.

Yifeng Tao Carnegie Mellon University 17

[Slide from Maria-FlorinaBalcan etal.]

Page 18: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Perceptron: Proof of Mistake Bound

oGuarantee: If data has margin 𝛾 and all points inside a ball of radius 𝑅, then Perceptron makes ≤ (𝑅/𝛾)2 mistakes.

oProof:o Idea: analyze 𝑤𝑡T𝑤∗ and ǁ𝑤𝑡ǁ, where 𝑤∗ is the max-margin sep, ǁ𝑤∗ǁ=1.oClaim 1: 𝑤𝑡+1T𝑤∗ ≥ 𝑤𝑡T𝑤∗ + 𝛾. (because 𝑥T𝑤∗ ≥ 𝛾)oClaim 2: 𝑤𝑡+12 ≤ 𝑤𝑡2 + 𝑅2. (by Pythagorean Theorem)

oAfter 𝑀 mistakes:o𝑤𝑀+1T𝑤∗ ≥ 𝛾𝑀 (by Claim 1)o ||𝑤𝑀+1|| ≤ 𝑅 𝑀� (by Claim 2)o𝑤𝑀+1T𝑤∗ ≤ ǁ𝑤𝑀+1ǁ (since 𝑤∗ is unit length)

oSo, 𝛾𝑀 ≤ 𝑅𝑀, so 𝑀 ≤ (R/ 𝛾)2.

Yifeng Tao Carnegie Mellon University 18

[Slide from Maria-FlorinaBalcan etal.]

Page 19: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Multilayer perceptron (MLP)

oA simple and basic type of feedforward neural networks

oContains many perceptrons that are organized into layers

oMLP “perceptrons” are not perceptrons in the strict sense

Yifeng Tao Carnegie Mellon University 19

[Slide from RussSalakhutdinov etal.]

Page 20: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Artificial Neuron (Perceptron)

Yifeng Tao Carnegie Mellon University 20

[Slide from RussSalakhutdinov etal.]

Page 21: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Artificial Neuron (Perceptron)

Yifeng Tao Carnegie Mellon University 21

[Slide from RussSalakhutdinov etal.]

Page 22: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Activation Function

osigmoid activation function:o Squashes the neuron’s output between 0 and 1 o Always positiveo Bounded o Strictly increasingo Used in classification output layer

otanh activation function:o Squashes the neuron’s output between -1 and 1o Boundedo Strictly increasingo A linear transformation of sigmoid function

Yifeng Tao Carnegie Mellon University 22

[Slide from RussSalakhutdinov etal.]

Page 23: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Activation Function

oRectified linear (ReLU) activation: oBounded below by 0 (always non-negative) oTends to produce units with sparse

activitiesoNot upper bounded oStrictly increasing

oMost widely used activation functionoAdvantages:

oBiological plausibilityoSparse activationoBetter gradient propagation: vanishing

gradient in sigmoidal activation

Yifeng Tao Carnegie Mellon University 23

[Slide from RussSalakhutdinov etal.]

Page 24: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Activation Function in Alexnet

oA four-layer convolutional neural networkoReLU: solid lineoTanh: dashed line

Yifeng Tao Carnegie Mellon University 24

[Slide from https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf]

Page 25: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Single Hidden Layer MLP

Yifeng Tao Carnegie Mellon University 25

[Slide from RussSalakhutdinov etal.]

Page 26: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Capacity of MLP

oConsider a single layer neural network

Yifeng Tao Carnegie Mellon University 26

[Slide from RussSalakhutdinov etal.]

Page 27: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Capacity of Neural Nets

oConsider a single layer neural network

Yifeng Tao Carnegie Mellon University 27

[Slide from RussSalakhutdinov etal.]

Page 28: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

MLP with Multiple Hidden Layers

Yifeng Tao Carnegie Mellon University 28

[Slide from RussSalakhutdinov etal.]

Page 29: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Capacity of Neural Nets

oDeep learning playground

Yifeng Tao Carnegie Mellon University 29

[Slide from https://playground.tensorflow.org]

Page 30: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Training a Neural Network

Yifeng Tao Carnegie Mellon University 30

[Slide from RussSalakhutdinov etal.]

Page 31: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Stochastic Gradient Descent

Yifeng Tao Carnegie Mellon University 31

[Slide from RussSalakhutdinov etal.]

Page 32: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Mini-batch SGD

oMake updates based on a mini-batch of examples (instead of a single example)oThe gradient is the average regularized loss for that mini-batchoCan give a more accurate estimate of the gradientoCan leverage matrix/matrix operations, which are more efficient

Yifeng Tao Carnegie Mellon University 32

[Slide from RussSalakhutdinov etal.]

Page 33: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Backpropagation

oMethod used to train neural networks following gradient descentoEssentially implementation of chain rule and dynamic programming

oThe derivative of last two terms:

Yifeng Tao Carnegie Mellon University 33

[Slide from https://en.wikipedia.org/wiki/Backpropagation]

Page 34: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Backpropagation

oIf oj is output à straightforwardoElse:

Yifeng Tao Carnegie Mellon University 34

[Slide from https://en.wikipedia.org/wiki/Backpropagation]

Page 35: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Weight Decay

Yifeng Tao Carnegie Mellon University 35

[Slide from RussSalakhutdinov etal.]

Page 36: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Optimization: Momentum

oMomentum: Can use an exponential average of previous gradients:

oCan get pass plateous more quickly, by “gaining momentum”oWorks well in positions with bad Hessian matrix

oSGD w/o momentum, SGD w/ momentum

Yifeng Tao Carnegie Mellon University 36

[Slide from http://ruder.io/optimizing-gradient-descent/]

Page 37: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Momentum-based OptimizationoNesterov accelerated gradient (NAG):

oAdagrad:o smaller updates for params associated with frequently occurring featureso larger updates for params associated with infrequent features

oRMSprop and Adadelta:oReduce the aggressive, monotonically decreasing learning rate in Adagrad

oAdam

Yifeng Tao Carnegie Mellon University 37

[Slide from http://ruder.io/optimizing-gradient-descent/]

Page 38: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Demo of Optimization Methods

Yifeng Tao Carnegie Mellon University 38

[Slide from http://ruder.io/optimizing-gradient-descent/]

Page 39: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

Take home message

oPerceptron is an online linear classifieroMultilayer perceptron consists perceptrons with various activationsoBackpropagation is an implementation of calculating gradients of

neural network in a backward and dynamic programming wayoMomentum-based mini-batch gradient descent methods are used in

optimizing neural networks

oWhat’s next?oRegularization in neural networksoWidely used NN architecture in practice

Carnegie Mellon University 39Yifeng Tao

Page 40: Introduction to Machine Learningyifengt/courses/machine-learning/slides/... · 2020-04-17 · Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon

References

oEric Xing, Tom Mitchell. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701-06f/

oBarnabás Póczos, Maria-Florina Balcan, Russ Salakhutdinov. 10715 Advanced Introduction to Machine Learning: https://sites.google.com/site/10715advancedmlintro2017f/lectures

oMatt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html

oWikipedia

Carnegie Mellon University 40Yifeng Tao