seyed abbas hosseini sharif university of technology most slides … · 2021. 3. 13. · most...

62
Seyed Abbas Hosseini Sharif University of Technology Regression Most slides are adopted from PRML book

Upload: others

Post on 06-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Seyed Abbas Hosseini Sharif University of Technology

Regression

Most slides are adopted from PRML book

Page 2: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Outline

2

•LinearBasisFunctionModels

•MaximumLikelihoodandLeastSquares

•RegularizedLeastSquares

•GradientDescentandSequentialLearning

•MultipleOutputs

•BiasVarianceTradeoff

Page 3: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Linear Basis Function Models

Page 4: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

LinearBasisFunctionModels(1)

• Example:PolynomialCurveFitting

4

Page 5: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

LinearBasisFunctionModels(2)

• Generally

• whereareknownasbasisfunctions.

• Typically= 1,forj=0sothatw0actsasabias.

• Inthesimplestcase,weuselinearbasisfunctions: = xj.

5

Page 6: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

LinearBasisFunctionModels(3)

• Polynomialbasisfunctions:

• Theseareglobal;asmallchangeinxaffectallbasisfunctions.

6

Page 7: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

LinearBasisFunctionModels(4)

• Gaussianbasisfunctions:

• Thesearelocal;asmallchangeinxonlyaffectnearbybasisfunctions andscontrollocationandscale(width).

7

Page 8: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

LinearBasisFunctionModels(5)

• Sigmoidalbasisfunctions:

• where

• Alsothesearelocal;asmallchangeinxonlyaffectnearbybasisfunctions. andscontrollocationandscale(slope).

8

Page 9: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Maximum Likelihood and Least Squares

Page 10: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

MaximumLikelihoodandLeastSquares(1)

• AssumeobservationsfromadeterministicfunctionwithaddedGaussiannoise:

• whichisthesameassaying,

• Givenobservedinputs,,andtargets,,weobtainthelikelihoodfunction

10

where

Page 11: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

MaximumLikelihoodandLeastSquares(2)

• Takingthelogarithm,weget

• where

• isthesum-of-squareserror.

11

Page 12: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

MaximumLikelihoodandLeastSquares(3)

• Computingthegradientandsettingittozeroyields

• Solvingforw,weget

• where

12

TheMoore-Penrosepseudo-inverse,.

Page 13: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

MaximumLikelihoodandLeastSquares(4)

• Maximizingwithrespecttothebias,w0,alone,weseethat

• Wecanalsomaximizewithrespecttobeta,giving

13

Page 14: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

GeometryofLeastSquares

• Consider

• Sisspannedby.• wMLminimizesthedistancebetweentanditsorthogonalprojectiononS,i.e.y.

14

N-dimensionalM-dimensional

Page 15: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Regularize Least Squares

Page 16: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

RegularizedLeastSquares(1)

• Considertheerrorfunction:

• Withthesum-of-squareserrorfunctionandaquadraticregularizer,weget

• whichisminimizedby

16

Dataterm+Regularizationterm

lambda iscalledtheregularizationcoefficient.

Page 17: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

RegularizedLeastSquares(2)

• Withamoregeneralregularizer,wehave

17

Lasso Quadratic

Page 18: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

RegularizedLeastSquares(3)

• Lassotendstogeneratesparsersolutionsthanaquadratic regularizer.

18

Page 19: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Gradient Descent & Sequential Learning

Page 20: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Gradient Descent in 1D

Suppose we want to minimize a function f(x) = x4 - 15x3 + 80x2 - 180x + 144• Many approaches for doing this.• We’ll discuss one approach today called “gradient descent”.

20x

f(x)

Page 21: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Gradient Descent Intuition

The intuition behind 1D gradient descent: • To the left of a minimum, derivative is negative (going down).• To the right of a minimum, derivative is positive (going up).• Derivative tells you where and how far to go.

Let’s work from here and try to invent gradient descent.

21

f(x)

x x

f(x)

Page 22: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Gradient Descent Algorithm

The gradient descent algorithm is shown below:• alpha is known as the “learning rate”.

• Too large and algorithm fails to converge.• Too small and it takes too long to converge.

22

x

f(x)

Page 23: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

GD Only Finds Local Minima

• If loss function has multiple local minima, GD is notguaranteed to find global minimum.

• Suppose we have this loss curve:

23

Page 24: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

GD Only Finds Local Minima

• Here’s how GD runs:

24

● GDcanconvergeat-15whenglobalminimumis-18

Page 25: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Convexity

• For a convex function f, any local minimum is also a global minimum.• If loss function convex, gradient descent will always find the

globally optimal minimizer.• Formally, f is convex if I draw a line between two points on curve, all

values on curve need to be on or below line. More formally:

25

Page 26: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Multi Dimensional Gradient Descent

On a 2D surface, the best way to go down is described by a 2D vector.

26

Page 27: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

On a 2D surface, the best way to go down is described by a 2D vector.

27

Multi Dimensional Gradient Descent

Page 28: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

On a 2D surface, the best way to go down is described by a 2D vector.

28

Multi Dimensional Gradient Descent

Page 29: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

On a 2D surface, the best way to go down is described by a 2D vector.

29

Multi Dimensional Gradient Descent

Page 30: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Next value for θ

Batch Gradient Descent

• Gradientdescentalgorithm:nudgeθinnegativegradientdirectionuntilθconverges.

• Batch gradient descent update rule:

30

θ:Modelweights L:lossfunction ⍺:Learningrate,typicallyeitherconstantor1/(t+1)y:Truevaluesfromthetrainingdata

Gradient of loss wrt θ

Learning rate

Page 31: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Gradient Descent Algorithm

• Repeat until model weights don’t change (convergence).• At this point, we have θ̂ , our minimizing model weights

31

● Initializemodelweightstoallzero○ Alsocommon:initializeusingsmallrandomnumbers

● Updatemodelweightsusingupdaterule:

Page 32: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

StochasticGradientDescent

1. Draw a simple random sample of data indices •Often called a batch or mini-batch

•Choice of batch size trade-off gradient quality and speed

2. Compute gradient estimate and uses as gradient 32

For 𝜏 from 0 to convergence:initial vector (random, zeros …)

Page 33: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

StochasticGradientDescent

33

initial vector (random, zeros …)

Decomposable Loss

Loss can be written as a sum of the loss on each record.

For 𝜏 from 0 to convergence:

Page 34: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

OnlineLinearRegression

• Dataitemsconsideredoneatatime(a.k.a.onlinelearning);usestochastic(sequential)gradientdescent:

• Thisisknownastheleast-mean-squares(LMS)algorithm.Issue:howtochooseeta?

34

Page 35: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Gradient Descent

35

Page 36: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

36

Stochastic Gradient Descent

Page 37: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Multiple Outputs

Page 38: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

MultipleOutputs(1)

• Analogouslytothesingleoutputcasewehave:

• Givenobservedinputs,,andtargets,,weobtaintheloglikelihoodfunction

38

Page 39: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

MultipleOutputs(2)

• MaximizingwithrespecttoW,weobtain

• Ifweconsiderasingletargetvariable,tk,weseethat

• where,whichisidenticalwiththesingleoutputcase.

39

Page 40: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Bias-Variance Tradeoff

Page 41: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceDecomposition(1)

• Recalltheexpectedsquaredloss,

• where

• ThesecondtermofE[L] correspondstothenoiseinherentintherandomvariablet.

• Whataboutthefirstterm?

41

Page 42: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceDecomposition(2)

• Supposeweweregivenmultipledatasets,eachofsizeN.Anyparticulardataset,D,willgiveaparticularfunctiony(x;D).Wethenhave

42

Page 43: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceDecomposition(3)

• TakingtheexpectationoverDyields

43

Page 44: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceDecomposition(4)

• Thuswecanwrite

• where

44

Page 45: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceDecomposition(5)

• Example:25datasetsfromthesinusoidal,varyingthedegreeofregularization,¸.

45

Page 46: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceDecomposition(6)

• Example:25datasetsfromthesinusoidal,varyingthedegreeofregularization,¸.

46

Page 47: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceDecomposition(7)

• Example:25datasetsfromthesinusoidal,varyingthedegreeofregularization,¸.

47

Page 48: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

TheBias-VarianceTrade-off

• Fromtheseplots,wenotethatanover-regularizedmodel(large )willhaveahighbias,whileanunder-regularizedmodel(small )willhaveahighvariance.

48

Page 49: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Bayesian Linear Regression

Page 50: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

BayesianLinearRegression(1)

• Defineaconjugateprioroverw

• CombiningthiswiththelikelihoodfunctionandusingresultsformarginalandconditionalGaussiandistributions,givestheposterior

• where

50

Page 51: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

BayesianLinearRegression(2)

• Acommonchoicefortheprioris

• forwhich

• Nextweconsideranexample…

51

Page 52: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

BayesianLinearRegression(3)

52

0datapointsobserved

Prior DataSpace

Page 53: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

BayesianLinearRegression(4)

53

1datapointobserved

Likelihood Posterior DataSpace

Page 54: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

BayesianLinearRegression(5)

54

2datapointsobserved

Likelihood Posterior DataSpace

Page 55: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

BayesianLinearRegression(6)

55

20datapointsobserved

Likelihood Posterior DataSpace

Page 56: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Predictive Distribution

Page 57: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

PredictiveDistribution(1)

• Predicttfornewvaluesofxbyintegratingoverw:

• where

57

Page 58: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

PredictiveDistribution(2)

• Example:Sinusoidaldata,9Gaussianbasisfunctions,1datapoint

58

Page 59: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

PredictiveDistribution(3)

• Example:Sinusoidaldata,9Gaussianbasisfunctions,2datapoints

59

Page 60: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

PredictiveDistribution(4)

• Example:Sinusoidaldata,9Gaussianbasisfunctions,4datapoints

60

Page 61: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

PredictiveDistribution(5)

• Example:Sinusoidaldata,9Gaussianbasisfunctions,25datapoints

61

Page 62: Seyed Abbas Hosseini Sharif University of Technology Most slides … · 2021. 3. 13. · Most slides are adopted from PRML book. Outline 2 •Linear Basis Function Models •Maximum

Any Questions?!