csci 447/547 machine learning€¦ · csci 447/547 machine learning linear regression . outline...

28
CSCI 447/547 MACHINE LEARNING Linear Regression

Upload: others

Post on 08-Jul-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

CSCI 447/547 MACHINE LEARNING

Linear Regression

Page 2: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Outline

Linear Models

1D Ordinary Least Squares (OLS)

Solution of OLS

Interpretation

Anscombe’s Quartet

Multivariate OLS

OLS Pros and Cons

Page 3: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Optional Reading

Page 4: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Terminology

Features (Covariates or predictors)

Labels (Variates or targets)

Regression

Classification

Page 5: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Types of Machine Learning

Unsupervised

Finding structure in data

Supervised

Predict from given data

Height

Weight

Height

Weight Women

Men

Classification categorical output data Logistic Regression

OLS Regression (Prediction) continuous output data

Weight

Height

Page 6: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

What is a Linear Model?

Predict Housing Prices

Depends on:

Area

# of bedrooms

# of bathrooms

Hypothesis is that relationship is linear

Price = k1(Area) + k2(#bed) + k3(#bath)

yi = a0 + a1x1 + a2x2 + …

Page 7: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Why Use Linear Models?

Interpretable

Relationships are easy to see

Low Complexity

Prevents overfitting

Scalable

Scale up to more data, larger problems

Baseline

Can benchmark other methods against them

Page 8: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Examples of Use

Example of Use MNIST dataset – handwritten digits Best performance – neural networks and

regularization 99.79% accurate Takes about a day to train More difficult to build

Logistic Regression 92.5% accurate Takes seconds to train Can be built with less expertise

Building Blocks of Later Techniques

Page 9: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Optional Reading

Page 10: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Definition of 1-Dimension OLS The Problem Statement

i is an observation, we have N of them

i = 1…N

x is the independent variable (feature)

y is dependent variable (output variable)

y = ax + b, a,b are constants

yi = axi + b OR yi = axi + b + ε

Two unknowns – want to solve for a and b

ˆ

Page 11: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

The Loss Function

L = ∑i=1N(yi – yi)

2

Goal is to minimize this function

Using yi = axi + b, the equation becomes:

L = ∑i=1N(yi – axi - b)2

So this is the equation we want to minimize

ˆ

ˆ

Page 12: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Solution of OLS

Derivation L = ∑i=1

N(yi – axi - b)2

Want to minimize L

Take derivative of loss function wrt each variable

𝑑𝐿

𝑑𝑎 = 0,

𝑑𝐿

𝑑𝑏 = 0

𝑑𝐿

𝑑𝑎 = 0 =>

𝑑𝐿

𝑑𝑎 = ∑i=1

N2(yi – axi - b)(-xi) = 0

=> 𝑑𝐿

𝑑𝑎 = ∑i=1

Nxiyi – a∑i=1Nxi

2 - b∑i=1Nxi = 0

Page 13: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Solution of OLS

Derivation

𝑑𝐿

𝑑𝑏 = 0 =>

𝑑𝐿

𝑑𝑏 = ∑i=1

N2(yi – axi - b)(+1) = 0

=> 𝑑𝐿

𝑑𝑏 = ∑i=1

Nyi –∑i=1Nxi – bN = 0

b = 1

𝑁 ∑i=1

Nyi – 𝑎

𝑁∑i=1

Nxi

This is the closed form solution for b

Page 14: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Solution of OLS

Derivation

From first set,

𝑑𝐿

𝑑𝑎 = ∑i=1

Nxiyi – a∑i=1Nxi

2 - b∑i=1Nxi = 0

=>∑i=1Nxiyi = a∑i=1

Nxi2 + ∑i=1

Nxi(1

𝑁 ∑i=1

Nyi – 𝑎

𝑁∑i=1

Nxi)

a = 𝑥𝑖𝑦𝑖 −

1

𝑁 𝑥𝑖𝑦𝑖𝑁1

𝑁1

𝑥2𝑁1 𝑖

−1

𝑁( 𝑥𝑖)

2 𝑁1

This is the closed form solution for a

Page 15: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Solution of OLS

Optimal Choices

Page 16: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Interpretation

Interpretation of a and b a is the slope of the line

tangent of angle θ

the effect of the independent variable on the dependent

b is the intercept of the line x – independent variable

y – dependent variable

θ

Page 17: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Interpretation

Interpretation of L

L = ∑i=1N(yi – yi)

2

Expresses how well the solution captures the variation in the data

R2 = 1 – MSE/Var(y)

R2 [0, 1]

Page 18: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Interpretation

Page 19: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Anscombe’s Quartet

Page 20: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Anscombe’s Quartet

Same values for mean, variance and best fit line

R2 values are the same for each example

But … linear regression may not be the best for the last three examples

Page 21: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Multivariable OLS

Definition of Model

Data Matrix

The Loss Function

Page 22: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Mutivariable OLS

i = an observation

N = number of observations

i = 1…N

M = number of features

xi = [xi1, xi2, …, xiM]

yi - dependent variable

Data matrix: X = 𝑥11 𝑥12… 𝑥1𝑀… … …𝑥𝑁1 𝑋𝑁2… 𝑋𝑁𝑀

Page 23: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Mutivariable OLS

Data matrix: X = 𝑥11 𝑥12… 𝑥1𝑀… … …𝑥𝑁1 𝑋𝑁2… 𝑋𝑁𝑀

y = ax + b(1)

Add a column of all 1’s to left of data matrix to get bias term included

yi = B0 + B1xi1 + B2xi2 + … + BMxiM

xi . B, B =

𝐵0…𝐵𝑀

, y = XB

ˆ

Page 24: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Multivariable OLS

Loss Function

L = ∑i=1N(yi – yi)

2

Still want to minimize L

L = ∑i=1N(yi – (B0 + B1 xi1 + … + BMxiM))2

L = ∑i=1N(yi – xiB)2

Norm manner – L2 norm of the vector

L = 𝑦 − 𝑋𝐵 22

L = (y – XB)T(y – XB)

ˆ

Page 25: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Optimization

A Few Facts from Matrix Calculus

𝑑(𝑎𝑥)

𝑑𝑥= 𝑎

𝑑 𝑎𝑥2

𝑑𝑥= 2𝑎𝑥

Page 26: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Optimization

Minimizing the Loss L = (y – XB)T(y – XB)

𝑑𝐿

𝑑𝐵= 0

𝑑 𝑦 −𝑋𝐵 𝑇(𝑦−𝑋𝐵)

𝑑𝐵= 0

𝑑(𝑦𝑇𝑦 −𝑦𝑇𝑋𝐵 −𝐵𝑇𝑋𝑇𝑦+𝐵𝑇𝑋𝑇𝑋𝐵)

𝑑𝐵= 0 ((XY)T = YTXT)

-(XTy) – (XTy) + 2(XTX)B = 0 XTy = (XTX)B B = (XTX)-1XTy (assuming XTX is invertible, which

is true if X is a full rank matrix, that is none of its columns are linearly dependent)

Page 27: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

OLS Pros and Cons

OLS

Pros

Efficient to compute

Unique minimum

Stable under perturbation of data

Easy to interpret

Cons

Influenced by outliers

(XTX)-1 may not exist

Features may not be linearly independent

Page 28: CSCI 447/547 Machine Learning€¦ · CSCI 447/547 MACHINE LEARNING Linear Regression . Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombe’s

Summary

Linear Models

1D Ordinary Least Squares (OLS)

Solution of OLS

Interpretation

Anscombe’s Quartet

Multivariate OLS

OLS Pros and Cons