computational intelligence (introduction to machine learning) ss14 · radial basis functions...

59
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

COMPUTATIONAL

INTELLIGENCE(INTRODUCTION TO MACHINE LEARNING) SS18

Lecture 2:

• Linear Regression

• Gradient Descent

• Non-linear basis functions

Page 2: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

LINEAR REGRESSION

MOTIVATION

Page 3: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Why Linear Regression?

• Simplest machine learning algorithm for regression• Widely used in biological, behavioural and social sciences to describe

and to extract relationships between variables from data

• Prediction of real-valued outputs

• Easy to implement, fast to execute

• Benchmark algorithm for comparison with more complex algorithms

• Introduction to notation and concepts that we will need again later in

the course• Data format, vector & matrix notation

• Learning from data by minimizing a cost function

• Gradient descent

• Non-linear features and basis functions• Preparation for neural networks

Page 4: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Applications of (linear) regression

• Brain computer interfaces

• https://www.youtube.com/watch?v=Ae6En8-eaww

• Neuroprosthetic control

• https://www.youtube.com/watch?v=X_AI4MiY6L4

Page 5: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

LINEAR REGRESSION

WITH ONE INPUT

Page 6: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

A regression problem• We want to learn to predict a person’s height based on his/her

knee height and/or arm span

• This is useful for patients who are bed bound or in a wheelchair

and cannot stand to take an accurate measurement of their height

Knee

Height

[cm]

Arm

span

[cm]

Height

[cm]

50 166 171

56 172 175

52 174 168

… … …

Page 7: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Linear regression with one input

Learning algorithm

„Hypothesis“

hx

Training set

Hypothesis

Parameters

Test input

Prediction

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

?

?

Page 8: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Example Data

Knee

height

[cm]

Arm

span

[cm]

Height

[cm]

50 166 171

56 172 175

52 174 168

… … …

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

160 165 170 175 180 185 190170

175

180

185

190

armspan

body h

eig

ht

m=30 data points

Page 9: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Example Data

4550

5560

160

180

200170

175

180

185

190

knee heightarmspan

body h

eig

ht

Knee

Height

[cm]

Arm

span

[cm]

Height

[cm]

50 166 171

56 172 175

52 174 168

… … …

Page 10: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Linear regression with one input

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… …

HypothesisParameters ?

Which hypothesis is better?

In what sense is it better?

Page 11: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Formalization of problem

• Given m training examples

• Goal: learn parameters

such that

for all training examples i=1…30.

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… …

m=30 data points

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Page 12: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Least Squares Objective

• Minimize Error

0.6

150

Page 13: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Least Squares Objective

• Minimize Error

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

10.77

0.6

150

cost function mean squared error

Page 14: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Least Squares Objective

• Minimize Error

5.94

0.75

140

cost function mean squared error

Page 15: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Cost function illustrated

Properties of cost function:

• Quadratic function

• Convex function

Unique local and global

minimum (under

„regular“ conditions)

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

10.77

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

5.94

Page 16: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Minimizing the cost

• Two ways to find the parameters

minimizing

• Gradient descent

• Direct analytical solution

(setting derivatives = 0)

Page 17: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Recall: Functions of multiple variables

• Example:

• Partial derivatives

• Gradient vector is formed with the partial derivatives (fundamental in lecture 2)

• Chain rule (fundamental for neural networks in lecture 4)

• Function of multiple variable with high dimensional values

• Jacobian matrix is formed with the partial derivatives

Page 18: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

GRADIENT DESCENT

Page 19: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Descending in the steepest directionGradient descent on some arbitrary cost function …

Page 20: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

learning rate („eta“)

Gradient descent algorithm

• Repeat until convergence

(simultaneously updating

and )

partial derivative of

with respect to

negative gradient =

descent

Page 21: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Gradient is orthogonal to contour

lines

-2-1

01

2 -2

-1

0

1

20

0.5

1

1.5

2

2.5

3

3.5

4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

A contour line

is a line along which

= const

Page 22: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Potential issues with gradient descent

• May get stuck in local minima

• Learning rate too small: slow

convergence

• Learning rate too large: oscillations,

divergence

too small too large

Page 23: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

LINEAR REGRESSION

WITH GRADIENT

DESCENT(ONE INPUT)

Page 24: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Application of gradient descent

• Linear regression cost • Gradient descent

(simultaneous update)

(simultaneous

update)

“error” “input”

”learning rate”

Page 25: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Predicting height from knee height

• Optimal fit to training data

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

0.8

137.4

Page 26: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

LINEAR REGRESSIONMORE GENERAL FORMULATION: MULTIPLE FEATURES

Page 27: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Multiple inputs (features)

• Notation:

… number of training examples

… number of features

… input features of i‘th training example (vector-valued)

…. value of feature j in i‘th training example

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

50 166 32 171

56 172 17 175

52 174 62 168

… … … …

= 3

=

56

172

17

= 17

Page 28: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Linear hypothesis

• Hypothesis (one input):

• Hypothesis (multiple input features):

• More compact notation:

Example: h(x) = 50 + 0.5*kneeheight + 0.3*armspan + 0.1*age

Introduce

Why? Notation convenience!

Page 29: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Multiple inputs (features) revisited

• Notation:

… number of training examples

… number of features

… input features of i‘th training example (vector-valued)

…. value of feature j in i‘th training example

x0

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

1 50 166 32 171

1 56 172 17 175

1 52 174 62 168

1 … … … …

= 3

=

1

56

172

17

= 17

= 1

Page 30: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Matrix and vector notation

x0

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

1 50 166 32 171

1 56 172 17 175

1 52 174 62 168

(n+1) ˟ 1 m ˟ (n+1) m ˟ 1

design matrixfeatures of i‘th training example output/target vector

Page 31: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Matrix and vector notation

x0

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

1 50 166 32 171

1 56 172 17 175

1 52 174 62 168

𝐻 𝜽 = 𝑋𝜽

Page 32: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

LINEAR REGRESSION

WITH GRADIENT

DESCENT(GENERAL FORMULATION)

Page 33: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Linear regression problem statement

• Hypothesis:

• Cost function:

Goal is to find parameters which minimize the cost

high-dimensional quadratic

(„bowl“-shaped) function

Page 34: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Gradient descent (multiple features)

(simultaneous

update for

j=0…n)

For j = 0: define for convenience

with one input feature:

with n input features:

(simultaneous

update)

“error”

“error”

“input”

“input””learning rate”

”learning rate”

Page 35: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

LINEAR REGRESSION

ANALYTICAL SOLUTION

Page 36: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Analytical solution

… design matrix

… output/target vector

• Set all partial derivatives of cost

function = 0

• Solving system of linear

equations yields:

Moore-Penrose Pseudoinverse of

• Note: This analytical solution requires that columns of are linearly

independent („regular“ conditions)

Page 37: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Example: analytical solution applied

to problem with one input

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… …

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Page 38: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Example: analytical solution applied

to problem with one input

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… … 30 ˟ 2 30 ˟ 1

2 ˟ 2

2 ˟ 2

2 ˟ 1

Page 39: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Predicting height from knee height

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

0.8

137.4

Page 40: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Gradient descent Analytical solution

• Need to choose learning

rate

• Iterative algorithm (needs

many iterations to

converge)

• Works well even when

number of input features

is large

• No need to choose

• Direct solution (no

iteration)

• Slow if is too large

(inverting n x n matrix)

Page 41: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

NON-LINEAR FEATURES(NON-LINEAR BASIS FUNCTIONS)

Page 42: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Non-linear trends in data

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

• How can we learn non-linear hypotheses?

?

? ? ?

Page 43: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Linear fit to this “non-linear” data

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

standard design matrix

Hypothesis:

Optimal parameters:

Page 44: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Linear fit to this “non-linear” data

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

Page 45: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Non-linear (quadratic) fit

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

design matrix with

non-linear features

Hypothesis:

Optimal parameters:

Page 46: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Non-linear (quadratic) fit

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

Page 47: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Non-linear (sinusoid) fit

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

design matrix with

non-linear features

Hypothesis:

Optimal parameters:

Page 48: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Non-linear (sinusoidal) fit

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

Page 49: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Non-linear input features (in general)

• Feature 2 for each training example i is computed by applying a

non-linear basis function:

• Allows to learn a variety of non-linear functions with the same technique(s):• Analytical or gradient descent

all features of

1st training example

feature 2 of all training examples

Page 50: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Polynomial regression• Features are powers of x

n = degree of polynome

to be learned

n=0 n=1

n=3 n=9

What happened here?

Next lecture…

Page 51: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Radial basis functions

• „Gaussian“-shaped RBFs (localized representation):• Each basis function j has a center in the input space

• The width of the basis functions is determined by

-6 -4 -2 0 2 4 6 80

0.2

0.4

0.6

0.8

1

x

Page 52: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

-6 -4 -2 0 2 4 6 80

0.2

0.4

0.6

0.8

1

x

Radial basis functions

• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space

• The width of the basis functions is determined by

Page 53: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

-6 -4 -2 0 2 4 6 80

0.2

0.4

0.6

0.8

1

x

Radial basis functions

• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space

• The width of the basis functions is determined by

Page 54: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Fitting a single RBF to data

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

RBF with

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

Page 55: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

-4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

-4 -2 0 2 4 6-15

-10

-5

0

Fitting RBFs to data

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

RBFs with

Page 56: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Image: JPEG = cosine-basis

Each block of 8x8 pixels is represented in a

Fourier basis of cosine filters

Better representation of edges and

corners and compresses the data

Page 57: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

SUMMARY (QUESTIONS)

Page 58: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

Some questions…

• Hypothesis for linear regression = ?

• Cost function for linear regression = ?

• How many local minima may the cost function for lin. reg. have (under

regular conditions)?

• Name two ways to minimize the cost function?

• General gradient descent formula?

• How is Linear regression with gradient descent solved?

• What issues can arise during gradient descent?

• What is the design matrix? What are its dimensions?

• Analytical solution for linear regression = ?• What are the components of the solution?

• Pros and Cons of gradient descent vs. analytical solution?

• How can one learn non-linear hypotheses with linear regression?

• What is polynomial regression?

• What are radial basis functions?

Page 59: Computational Intelligence (Introduction to machine learning) SS14 · Radial basis functions •„Gaussian“-shaped RBFs (localized representation): • Each basis function j has

What is next?

• Classification with Logistic Regression

• Gradient descent tricks & more advanced optimization techniques

• Underfitting & Overfitting

• Model selection (Training, Validation and test set)