machine learning math essentials part 2

22
Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2

Upload: ide

Post on 22-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Machine Learning Math Essentials Part 2. Gaussian distribution. Most commonly used continuous probability distribution Also known as the normal distribution Two parameters define a Gaussian: Mean location of center Variance 2 width of curve. Gaussian distribution. In one dimension. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 1

Machine Learning

Math EssentialsPart 2

Page 2: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 2

Most commonly used continuous probability distribution

Also known as the normal distribution

Two parameters define a Gaussian:– Mean location of center– Variance 2 width of curve

Gaussian distribution

Page 3: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 3

2

2

2)(

2/122

)2(1),|(

x

exΝ

Gaussian distribution

In one dimension

Page 4: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 4

2

2

2)(

2/122

)2(1),|(

x

exΝ

Gaussian distribution

In one dimension

Normalizing constant: insures that distribution

integrates to 1

Controls width of curve

Causes pdf to decrease as distance from center

increases

Page 5: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 5

Gaussian distribution

= 0 2 = 1 = 2 2 = 1

= 0 2 = 5 = -2 2 = 0.3

2

21

21 xex

Page 6: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 6

)()(21

2/12/

1T

||1

)2(1),|(

μxΣμx

ΣΣμx

eΝ d

Multivariate Gaussian distribution

In d dimensions

x and now d-dimensional vectors– gives center of distribution in d-dimensional space

2 replaced by , the d x d covariance matrix– contains pairwise covariances of every pair of features– Diagonal elements of are variances 2 of individual

features– describes distribution’s shape and spread

Page 7: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 7

Covariance– Measures tendency for two variables to deviate from

their means in same (or opposite) directions at same time

Multivariate Gaussian distributionno

cov

aria

nce high (positive)

covariance

Page 8: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 8

In two dimensions

Multivariate Gaussian distribution

13.03.025.0

00

Σ

μ

Page 9: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 9

In two dimensions

Multivariate Gaussian distribution

26.06.02

Σ

1002

Σ

1001

Σ

Page 10: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 10

In three dimensions

Multivariate Gaussian distribution

rng( 1 );mu = [ 2; 1; 1 ];sigma = [ 0.25 0.30 0.10; 0.30 1.00 0.70; 0.10 0.70 2.00] ;x = randn( 1000, 3 );x = x * sigma;x = x + repmat( mu', 1000, 1 );scatter3( x( :, 1 ), x( :, 2 ), x( :, 3 ), '.' );

Page 11: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 11

Orthogonal projection of y onto x– Can take place in any space of dimensionality > 2– Unit vector in direction of x is

x / || x ||– Length of projection of y in

direction of x is|| y || cos( )

– Orthogonal projection ofy onto x is the vector

projx( y ) = x || y || cos( ) / || x || =[ ( x y ) / || x ||2 ] x (using dot product

alternate form)

Vector projection

y

x

projx( y )

Page 12: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 12

There are many types of linear models in machine learning.– Common in both classification and regression.– A linear model consists of a vector in d-dimensional

feature space.– The vector attempts to capture the strongest gradient

(rate of change) in the output variable, as seen across all training samples.

– Different linear models optimize in different ways.– A point x in feature space is mapped from d dimensions

to a scalar (1-dimensional) output z by projection onto :

Linear models

dd xxz 11xβ

Page 13: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 13

There are many types of linear models in machine learning.– The projection output z is typically transformed to a final

predicted output y by some function :

example: for logistic regression, is logistic function example: for linear regression, ( z ) = z

– Models are called linear because they are a linear function of the model vector components 1, …, d.

– Key feature of all linear models: no matter what is, a constant value of z is transformed to a constant value of y, so decision boundaries remain linear even after transform.

Linear models

)()()( 11 dd xxffzfy xβ

Page 14: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 14

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

w0 w

Page 15: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 15

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 16: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 16

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 17: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 17

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 18: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 18

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

margin

Page 19: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 19

From projection to prediction

positive margin class 1

negative margin class 0

Page 20: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 20

Interpreting the model vector of coefficients

From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ] = B( 1 ), = [ 1 2 ] = B( 2 : 3 ) , define location and orientation

of decision boundary– - is distance of decision

boundary from origin– decision boundary is

perpendicular to magnitude of defines gradient

of probabilities between 0 and 1

Logistic regression in two dimensions

Page 21: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 21

Logistic function in d dimensions

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 22: Machine Learning Math Essentials Part 2

Jeff Howbert Introduction to Machine Learning Winter 2012 22

Decision boundary for logistic regression

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)