andrew rosenberg- lecture 2.1: vector calculus csc 84020 - machine learning

8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning

http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 1/46

Lecture 2.1: Vector Calculus

CSC 84020 - Machine Learning

Andrew Rosenberg

February 5, 2009

http://find/

http://goback/



Today

Last Time

Probability ReviewToday

Vector Calculus

http://find/

http://goback/



Background

Let’s talk.Linear Algebra

VectorsMatricesBasis Spaces

Eigenvectors/values?Inversion and transposition

Calculus

DerivationIntegration

Vector CalculusGradientsDerivation w.r.t. a vector

http://find/

http://goback/



Linear Algebra Basics

What is a vector?

What is a matrix?

Transposition

Adding matrices and vectors

Multiplying matrices.

http://find/

http://goback/



Definitions

A vector is a one dimensional array.We denote vectors as either x, x.If we don’t specify otherwise assume x is a column vector .

x =

x 0x 1. . .

x n−1

http://find/

http://goback/



Definitions

A matrix is a higher dimensional array.We typically denote matrices as capital letters e.g., A.If A is an n-by-m matrix, it has the following structure

A =

a0,0 a0,1 . . . a0,m−1

a1,0 a1,1 a1,m−1...

. . ....

an−1,0 an−1,1 . . . an−1,m−1

M i i i

http://find/

http://goback/



Matrix transposition

Transposing a matrix or vector swaps rows and columns.

A column-vector becomes a row-vector

x =

x 0x 1. . .

x n−1

x

T

=

x 0 x 1 . . . x n−

1

M i i i

http://find/

http://goback/



Matrix transposition

Transposing a matrix or vector swaps rows and columns.

A column-vector becomes a row-vector

A =

a0,0 a0,1 . . . a0,m−1

a1,0 a1,1 a1,m−1...

.. .

.

..an−1,0 an−1,1 . . . an−1,m−1

A

T

=

a0,0 a1,0 . . . an−1,0

a0,1 a1,1 a1,m−1

... . . . ...a0,m−1 a1,m−1 . . . an−1,m−1

If A is n-by-m, then AT is m-by-n.

Addi M t i

http://find/

http://goback/



Adding Matrices

Matrices can only be added if they have the same dimension.

A+B =

a0,0

+ b 0,0

a0,1

+ b 0,1

. . . a0,m

−

1+ b

0,m−

1a1,0 + b 1,0 a1,1 + b 1,1 a1,m−1 + b 1,m−1...

. . ....

an−1,0 + b n−1,0 an−1,1 + b n−1,1 . . . an−1,m−1 + b n−1,m−1

M lti l i t i

http://find/

http://goback/



Multiplying matrices

To multiply two matrices, the inner dimensions must match.

An n-by-m can be multiplied by an n′-by-m′ matrix iff m = n′.

AB = C

c ij =m

k =0

aik ∗ b kj

That is, multiply the i -th row by the j -th column.

Image from wikipedia.

Us f l t i ti s

http://find/

http://goback/



Useful matrix operations

Inversion

NormEigenvector decomposition

Matrix Inversion

http://find/

http://goback/



Matrix Inversion

The inverse of an n-by-m matrix A is denoted A−1, and has thefollowing property.

AA−1 = I

Where I is the identity matrix, an n-by-n matrix where I ij = 1 iff i = j and 0 otherwise.If A is a square matrix (iff n = m) then,

A−1A = I

Matrix Inversion

http://find/

http://goback/



Matrix Inversion

The inverse of an n-by-m matrix A is denoted A−1, and has thefollowing property.

AA−1 = I

Where I is the identity matrix, an n-by-n matrix where I ij = 1 iff i = j and 0 otherwise.If A is a square matrix (iff n = m) then,

A−1A = I

What is the inverse of a vector? x−1 =?

Some useful Matrix Inversion Properties

http://find/

http://goback/



Some useful Matrix Inversion Properties

(A−1)−1 = A

(kA

)

−1

=k −1A−1

(AT )−1 = (A−1)T

(AB )−1

= B −1

A−1

The norm of a vector

http://find/

http://goback/



The norm of a vector

The norm of a vector x is written ||x||.The norm represents the euclidean length of a vector.

||x|| =

n−1i =0

x 2i

=

x 20 + x 21 + . . . + x 2n−1

Positive Definite/Semi Definite

http://find/

http://goback/



Positive Definite/Semi-Definite

A positive definite matrix, M has the property that

x T Mx > 0

A positive semi-definite matrix, M has the property that

x T Mx ≥ 0

Why might we care about these matrices?

Eigenvectors

http://find/

http://goback/



Eigenvectors

For a square matrix A, the eigenvector is defined as

Aui = λi ui

Where ui is an eigenvector and λi is its correspondingeigenvalue.

In general, eigenvalues are complex numbers, but if A issymmetric, they are real.

Eigenvectors describe how a matrix transforms a vector, and canbe used to define a basis space, namely the eigenspace.

Who cares? The eigenvectors of a covariance matrix have somevery interesting properties.

Basis Spaces

http://find/

http://goback/



Basis Spaces

Basis spaces allow vectors to be represented in different spaces.Our normal 2-dimensional basis space is generated by the vectors[0, 1], [1, 0].

Any 2-d vector can be expressed as the sum of linear factorsof these two basis vectors.

However, any two non-colinear vectors can generate a 2-d basisspace. In this basis space, the generating vectors are perpendicular.

Basis Spaces

http://find/

http://goback/



Basis Spaces

Basis Spaces

http://find/

http://goback/



Basis Spaces

Why do we care?

Dimensionality reduction.

Calculus Basics

http://find/

http://goback/



Calculus Basics

What is a derivative?

What is an integral?

Derivatives

http://find/

http://goback/



A derivative, d dx

f (x ) can be thought of as defining the slope of a

function f (x ). This is sometimes also written as f ′(x ).

Derivative Example

http://find/

http://goback/



p

Integrals

http://find/

http://goback/



g

Integrals are an inverse operation of the derivative (plus aconstant).

f (x )dx = F (x ) + c

F ′(x ) = f (x )

An integral can be thought of as a calculation of the area underthe curve defined by f (x ).

A definite integral evaluates the area over a finite region. Anindefinite integral is calculated over the range of (−∞,∞).

Integration Example

http://find/

http://goback/



g p

Useful calculus operations

http://find/

http://goback/



p

Product, quotient, summation rules for derivatives.

Useful integration and derivative identities.

Chain ruleIntegration by parts

Variable substitution (don’t forget the Jacobian!)

Calculus Identities

http://find/

http://goback/



Summation rule

g (x ) = f 0(x ) + f 0(x )

g ′(x ) = f ′0(x ) + f ′1(x )

Product Ruleg (x ) = f 0(x )f 1(x )

g ′(x ) = f 0(x )f ′1(x ) + f ′0(x )f 1(x )

Quotient Rule

g (x ) =f 0(x )

f 1(x )

g ′(x ) =f 0(x )f ′1(x )− f ′0(x )f 1(x )

f 21 (x )

Calculus Identities

http://find/

http://goback/



Constant multipliersg (x ) = cf (x )

g ′(x ) = cf ′(x )

Exponent Rule

g (x ) = f (x )k

g ′(x ) = kf (x )k −1

Chain Rule

g (x ) = f 0(f 1(x ))

g ′(x ) = f ′0(f 1(x ))f ′1(x )

Calculus Identities

http://find/

http://goback/



Exponent Ruleg (x ) = e x

g ′(x ) = e x

g (x ) = k x

g ′(x ) = ln(k )k x

Logarithm Ruleg (x ) = ln(x )

g ′(x ) =1

x

g (x ) = logb (x )

g ′(x ) =1

x ln b

Calculus Operations

http://find/

http://goback/



Integration by Parts

f (x )

dg (x )

dx dx = f (x )g (x ) −

g (x )

df (x )

dx dx

Variable Substitution

b a

f (g (x ))g ′(x )dx =

g (b )g (a)

f (x )dx

Vector Calculus

http://find/

http://goback/



Derivation with respect to to a vector or matrix.

Gradient of a vector.Change of variables with a vector.

Derivation with respect to a vector

http://find/

http://goback/



Given a vector x = (x 0, x 1, . . . , x n−1)T

, and a functionf (x) : Rn → R how can we find ∂ f (x)

∂ x?


http://find/

http://goback/





∂ x?

∂ f (x)

∂ x =

∂ f (x)∂ x 0

∂ f (x)∂ x

1...∂ f (x)∂ x n−1

This is also called the gradient of the function, and is often

written ∇f (x ) or ∇f .


http://find/

http://goback/





∂ x?

∂ f (x)

∂ x =

∂ f (x)∂ x 0

∂ f (x)∂ x

1...∂ f (x)∂ x n−1

This is also called the gradient of the function, and is often

written ∇f (x ) or ∇f .Why might this be useful?

Useful Vector Calculus identities

http://find/

http://goback/



Given a vector x with |x| = n and a scalar variable y .

∂ x∂ y

=

∂ x0∂ y ∂ x1∂ y ...

∂ xn−1∂ y

Useful Vector Calculus identities

http://find/

http://goback/



Given a vector x with |x| = n and a vector y with |y| = m .

∂ x∂ y

=

∂ x0∂ y0

∂ x0∂ y1

. . . ∂ x0∂ ym−1

∂ x1∂ y0

∂ x1∂ y1 . . .

∂ x1∂ ym−1

......

. . ....

∂ xn−1

∂ y0

∂ xn−1∂ y1

. . .∂ xn−1∂ ym−1

Vector Calculus Identities

http://find/

http://goback/



Similar to – Scalar Multiplication Rule

∂

∂ x (xT a) =∂

∂ x (aT x) = a

Similar to – Product Rule

∂

∂ x

(AB ) =∂ A

∂ x

B + A∂ B

∂ x Derivative of an Matrix inverse.

∂

∂ x (A−1) = −A−1∂ A

∂ x A−1

Change of Variable in an Integral

f (x)d x =

f (u)

∂ x

∂ u

d u

Calculating the Expectation of a Gaussian

http://find/

http://goback/



Now we have enough tools to calculate the expectation of avariable given a Gaussian Distribution.

Recall:

E[x |µ, σ

2

] =

p (x |µ, σ2

)xdx

=

N (x |µ, σ2)xdx

= 1

√2πσ2

exp −1

2σ2

(x

−µ)2 xdx


http://find/

http://goback/



E[x |µ, σ2] =

Z 1√

2πσ2exp

− 1

2σ2(x − µ)2

ffxdx

u = x − µ

du = dx

E[x |µ, σ2] =

Z 1√

2πσ2exp

− 1

2σ2(x − µ)2

ffxdx

=

Z 1√

2πσ2exp

− 1

2σ2u 2

ff(u + µ)du

=Z

1√2πσ2

exp − 1

2σ2u 2ff

udu + µ

Z 1√

2πσ2exp

− 1

2σ2u 2ff

du


http://find/

http://goback/



E[x |µ, σ2] =

Z 1√

2πσ2exp

− 1

2σ2u 2

ffudu + µ

Z 1√

2πσ2exp

− 1

2σ2u 2

ffdu

Z 1√

2πσ2exp

− 1

2σ2u 2

ffdu = 1

E[x |µ, σ2] =Z

1√2πσ2 exp − 12σ2 u 2

ffudu + µ

Aside: A function is Odd iff f (x ) = −f (−x ).

Odd functions have the propertyR ∞

−∞f (x )dx = 0.

A function is Even iff f (x ) = f (−x ).

The product of an odd function and an even function is an odd function.


http://find/

http://goback/



E[x |µ, σ2] =

1√2πσ2

exp

− 1

2σ2u 2

udu + µ

exp

− 1

2σ2u 2

is even

u is odd

exp

− 1

2σ2u 2

u is odd

1√

2πσ2

exp

− 1

2σ2u 2

udu = 0

E[x |µ, σ2] = µ

Why does Machine Learning need these tools?

http://find/

http://goback/



Calculus

We need to find maximum likelihoods or minimum risks. Thisoptimization is accomplished with derivatives.

Integration allows us to marginalize continuous probability

density functions.

Linear Algebra

We will be working in high-dimension spaces.

Vectors and Matrices allow us to refer to high dimensionalpoints – groups of features – as vectors.

Matrices allow us to describe the feature space .

Why does machine learning need these tools

http://find/

http://goback/



Vector Calculus

We need to do all of the calculus operations in

high-dimensional feature spaces.We will want to optimize multiple values simultaneously –Gradient Descent.

We will need to take a marginal over a high dimensional

distributions – Gaussians.

Broader Context

http://find/

http://goback/



What we have so far:

Entities in the world are represented as feature vectors andmaybe a label.

We want to construct statistical models of the feature vectors.Finding the most likely model is an optimization problem.

Since the feature vectors may have more than one dimension,linear algebra can help us work with them.

Bye

http://find/

http://goback/



Next

Linear Regression

http://find/

http://goback/

andrew rosenberg- lecture 2.1: vector calculus csc 84020 - machine learning

Documents