csc 4510 – machine learningmap/4510/05regression-multivariate.pdf · csc 4510 ‐ m.a....

CSC 4510 – Machine Learning Dr. Mary‐Angela Papalaskari Department of CompuBng Sciences Villanova University

Course website: www.csc.villanova.edu/~map/4510/

5: Mul'variate Regression

1 CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University

The slides in this presentaBon are adapted from: •  Andrew Ng’s ML course hNp://www.ml‐class.org/

Regression topics so far •  IntroducBon to linear regression •  IntuiBon – least squares approximaBon •  IntuiBon – gradient descent algorithm •  Hands on: Simple example using excel •  How to apply gradient descent to minimize the cost funcBon for regression

•  linear algebra refresher

CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 2

What’s next? •  MulBvariate regression •  Gradient descent revisited

–  Feature scaling and normalizaBon –  SelecBng a good value for α

•  Non‐linear regression •  Solving for analyBcally (Normal EquaBon) •  Using Octave to solve regression problems


Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178

What’s next? We are not in univariate regression anymore:

4 CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University

Andrew Ng

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … …

Mul'ple features (variables).


Andrew Ng

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … …


NotaBon: = number of features = input (features) of training example. = value of feature in training example.


Andrew Ng

Size (feet2)

Price ($1000)

2104 460 1416 232 1534 315 852 178 … …




For convenience of notaBon, define .

Mul$variate linear regression

Hypothesis: Previously:

Now:


Hypothesis:

Cost func'on:

Parameters:

(simultaneously update for every )

Repeat Gradient descent:


(simultaneously update )

Gradient Descent

Repeat Previously (n=1):



Gradient Descent


New algorithm : Repeat

(simultaneously update for )


E.g. = size (0‐2000 feet2)

= number of bedrooms (1‐5)

Feature Scaling Idea: Make sure features are on a similar scale.

size (feet2)

number of bedrooms

Get every feature into range


E.g. = size (0‐2000 feet2)

= number of bedrooms (1‐5)

Feature Scaling Idea: Make sure features are on a similar scale.

Replace with to make features have approximately zero mean (Do not apply to ). Mean normaliza'on

E.g.


Gradient descent

‐  “Debugging”: How to make sure gradient descent is working correctly.

‐  How to choose learning rate .


0 100 200 300 400

No. of iteraBons

Making sure gradient descent is working correctly.

‐  For sufficiently small , should decrease on every iteraBon. ‐  But if is too small, gradient descent can be slow to converge.

Declare convergence if decreases by less than in one iteraBon?


Summary: Choosing ‐  If is too small: slow convergence. ‐  If is too large: may not decrease on

every iteraBon; may not converge.

To choose , try

Andrew Ng

Housing prices predic'on


Andrew Ng

Polynomial regression

Price (y)

Size (x)


Andrew Ng

Choice of features

Price (y)

Size (x)


Andrew Ng

Gradient Descent

Normal equaBon: Method to solve for analyBcally.


Andrew Ng

IntuiBon: If 1D

Solve for

(for every )


Andrew Ng

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178

Examples:


Andrew Ng

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 1

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 3000 4 1 38 540

Examples:


Andrew Ng

examples ; features.

E.g. If


Andrew Ng

is inverse of matrix .

Octave: pinv(X’*X)*X’*y


Andrew Ng

training examples, features. Gradient Descent Normal EquaBon

•  No need to choose . •  Don’t need to iterate.

•  Need to choose . •  Needs many iteraBons. •  Works well even when is large.

•  Need to compute

•  Slow if is very large.



Notes on Supervised learning and Regression hNp://see.stanford.edu/materials/aimlcs229/cs229‐notes1.pdf

Octave hNp://www.gnu.org/sonware/octave/ Wiki: hNp://www.octave.org/wiki/index.php?Btle=Main_Page documentaBon: hNp://www.gnu.org/sonware/octave/doc/interpreter/


Exercise For next class: 1.  Download and install Octave (AlternaBve: if you have MATLAB, you can use it instead.) 2.  Verify that it is working by typing in an Octave command window:

x = [0 1 2 3] y = [0 2 4 6] plot(x,y) This example defines two vectors, x y and should display a plot showing a straight line (the line y=2x). If you get an error at this point, it may be that gnuplot is not installed or cannot access your display. If you are unable to get this to work, you can sBll do the rest of this exercise, because it does not involve any plorng (just restart Octave). You might refer to the Octave wiki for installaBon help but if you are stuck, you can get some help troubleshooBng this on Friday anernoon 3‐4pm in the sonware engineering lab (mendel 159).

3.  Create a few matrices and vectors, eg: A = [1 2; 3 4; 5 6] V = [3 5 ‐1 0 7]

4.  Try some of the elementary matrix and vector operaBons from our linear algebra slides (adding, mulBplying between matrices, vectors and scalars)

5.  Print out a log of your session

csc 4510 – machine learningmap/4510/05regression-multivariate.pdf · csc 4510 ‐ m.a....

Documents