machine learning for signal...

Machine Learning for Signal Processing

Fundamentals of Linear Algebra

Class 2. 6 Sep 2016

Instructor: Bhiksha Raj

11-755/18-797 1

Overview

• Vectors and matrices

• Basic vector/matrix operations

• Various matrix types

• Projections

11-755/18-797 2

Book

• Fundamentals of Linear Algebra, Gilbert Strang

• Important to be very comfortable with linear algebra – Appears repeatedly in the form of Eigen analysis, SVD, Factor

analysis

– Appears through various properties of matrices that are used in machine learning

– Often used in the processing of data of various kinds

– Will use sound and images as examples

• Today’s lecture: Definitions – Very small subset of all that’s used

– Important subset, intended to help you recollect

11-755/18-797 3

Incentive to use linear algebra

• Simplified notation!

• Easier intuition

– Really convenient geometric interpretations

• Easy code translation!

11-755/18-797 4

for i=1:n

for j=1:m

c(i)=c(i)+y(j)*x(i)*a(i,j)

end

end

C=x*A*y

y j x iaiji

j

yAx T

And other things you can do

• Manipulate Data

• Extract information from data

• Represent data..

• Etc.

11-755/18-797 5

Rotation + Projection +

Scaling + Perspective

Time F

requ

en

cy

From Bach’s Fugue in Gm

Decomposition (NMF)

Scalars, vectors, matrices, …

• A scalar a is a number – a = 2, a = 3.14, a = -1000, etc.

• A vector a is a linear arrangement of a collection of scalars

• A matrix A is a rectangular arrangement of a collection of scalars

11-755/18-797 6

511.3

62.21A

a 1 2 3 , a 3.14

32

Vectors in the abstract • Ordered collection of numbers

– Examples: [3 4 5], [a b c d], ..

– [3 4 5] != [4 3 5] Order is important

• Typically viewed as identifying (the path from origin to) a location in an N-dimensional space

11-755/18-797 7

x

z

y

3

4

5

(3,4,5)

(4,3,5)

Vectors in reality • Vectors usually hold sets of

numerical attributes

– X, Y, Z coordinates

• [1, 2, 0]

– [height(cm) weight(kg)]

– [175 72]

– A location in Manhattan

• [3av 33st]

• A series of daily temperatures

• Samples in an audio signal

• Etc.

11-755/18-797 8

[-2.5av 6st]

[2av 4st]

[1av 8st]

Vector norm

• Measure of how long a vector is:

– Represented as

• Geometrically the shortest distance to travel from the origin to the destination

– As the crow flies

– Assuming Euclidean Geometry

11-755/18-797 9

[-2av 17st] [-6av 10st]

a b ... a2 b2 ...2

x

3

4

5

(3,4,5) Length = sqrt(32 + 42 + 52)

Vector Operations: Multiplication by scalar

• Vector multiplication by scalar: each component multiplied by scalar

– 2.5 x (3,4,5) = (7.5, 10, 12.5)

• Note: as a result, vector norm is also multiplied by the scalar

– ||2.5 x (3,4,5)|| = 2.5x|| (3, 4, 5)||

11-755/18-797 10

3

(3,4,5)

(7.5, 10, 12.5)

Multiplication by scalar “stretches” the vector

Vector Operations: Addition

• Vector addition: individual components add

– (3,4,5) + (3,-2,-3) = (6,2,2)

– Only applicable if both vectors are the same size 11-755/18-797 11

3

4

5

(3,4,5) 3

-2

-3

(3,-2,-3)

(6,2,2)

An introduction to spaces

• Conventional notion of “space”: a geometric construct of a certain number of “dimensions”

– E.g. the 3-D space that this room and every object in it lives in

11-755/18-797 12

A vector space

• A vector space is an infinitely large set of vectors the following properties

– The set includes the zero vector (of all zeros)

– The set is “closed” under addition • If X and Y are in the set, aX + bY is also in the set for any two

scalars a and b

– For every X in the set, the set also includes the additive inverse Y = -X, such that X + Y = 0

11-755/18-797 13

Additional Properties

• Additional requirements:

– Scalar multiplicative identity element exists: 1X = X

– Addition is associative: X + Y = Y + X

– Addition is commutative: (X+Y)+Z = X+(Y+Z)

– Scalar multiplication is commutative: a(bX) = (ab) X

– Scalar multiplication is distributive: (a+b)X = aX + bX a(X+Y) = aX + aY

11-755/18-797 14

Example of vector space

• Set of all three-component column vectors

– Note we used the term three-component, rather than three-dimensional

• The set includes the zero vector

• For every X in the set 𝛼 ∈ ℛ, every aX is in the set

• For every X, Y in the set, aX + bY is in the set

• -X is in the set

• Etc.

11-755/18-797 15

Example: a function space

11-755/18-797 16

Dimension of a space

• Every element in the space can be composed

of linear combinations of some other

elements in the space

– For any X in S we can write X = aY1 + bY2 + cY3..

for some other Y1, Y2, Y3 .. in S

• Trivial to prove..

11-755/18-797 17

Dimension of a space

• What is the smallest subset of elements that can compose the entire set?

– There may be multiple such sets

• The elements in this set are called “bases”

– The set is a “basis” set

• The number of elements in the set is the “dimensionality” of the space

11-755/18-797 18

Dimensions: Example

• What is the dimensionality of this vector space

11-755/18-797 19

Dimensions: Example

• What is the dimensionality of this vector space?

– First confirm this is a proper vector space

• Note: all elements in Z are also in S (slide 19)

– Z is a subspace of S

11-755/18-797 20

Dimensions: Example

• What is the dimensionality of this space?

11-755/18-797 21

Moving on….

11-755/18-797 22

Interpreting Matrices • Two interpretations of a matrix

• As a transform that modifies vectors and vector spaces

• As a container for data (vectors)

• In the next two classes we’ll focus on the first view

• But we will mostly consider the second view for ML algorithms in the our discussions of signal representations

11-755/18-797 23

dc

ba

onmlk

jihgf

edcba

Interpreting Matrices as collections of vectors

• A matrix can be vertical stacking of row vectors

– The space of all vectors that can be composed from the rows of the matrix is the row space of the matrix

• Or a horizontal arrangement of column vectors

– The space of all vectors that can be composed from the columns of the matrix is the column space of the matrix

11-755/18-797 24

fedcba

R

fedcba

R

Dimensions of a matrix

• The matrix size is specified by the number of rows and columns

– c = 3x1 matrix: 3 rows and 1 column

– r = 1x3 matrix: 1 row and 3 columns

– S = 2 x 2 matrix

– R = 2 x 3 matrix

– Pacman = 321 x 399 matrix

11-755/18-797 25

cba

c

b

a

rc ,

fed

cba

dc

baRS ,

Representing an image as a matrix • 3 pacmen

• A 321 x 399 matrix

– Row and Column = position

• A 3 x 128079 matrix

– Triples of x,y and value

• A 1 x 128079 vector

– “Unraveling” the matrix

• Note: All of these can be recast as the matrix that forms the image

– Representations 2 and 4 are equivalent

• The position is not represented

11-755/18-797 26

1.1.00.1.11

10.10.65.1.21

10.2.22.2.11

1..000.11.11

Y

X

v

Values only; X and Y are

implicit

Basic arithmetic operations

• Addition and subtraction (vectors and matrices)

– Element-wise operations

11-755/18-797 27

a b

a1

a2

a3

b1

b2

b3

a1 b1

a2 b2

a3 b3

a b

a1

a2

a3

b1

b2

b3

a1 b1

a2 b2

a3 b3

A B a11 a12

a21 a22

b11 b12

b21 b22

a11 b11 a12 b12

a21 b21 a22 b22

Vector/Matrix Transposition

• A transposed row vector becomes a column (and vice versa)

• A transposed matrix gets all its row (or column) vectors transposed in order

11-755/18-797 28

x

a

b

c

, xT a b c

X a b c

d e f

, X

T

a d

b e

c f

y a b c , yT a

b

c

M

, MT

Vector multiplication • Multiplication by scalar

• Dot product, or inner product

– Vectors must have the same number of elements

– Row vector times column vector = scalar

• Outer product or vector direct product

– Column vector times row vector = matrix

11-755/18-797 29

a

b

c

d e f a d a e a f

b d b e b f

c d c e c f

a b c d

e

f

a d b e c f

dcdbdacbad

cd

bd

ad

d

c

b

a

.

Vector dot product • Example:

– Coordinates are yards, not ave/st

– a = [200 1600],

b = [770 300]

• The dot product of the two vectors relates to the length of a projection

– How much of the first vector have we covered by following the second one?

– Must normalize by the length of the “target” vector

11-755/18-797 30

[200yd 1600yd]

norm ≈ 1612

[770yd 300yd]

norm ≈ 826

a bT

a

200 1600 770

300

200 1600 393yd

norm

≈ 393yd

Vector dot product

• Vectors are spectra – Energy at a discrete set of frequencies

– Actually 1 x 4096

– X axis is the index of the number in the vector • Represents frequency

– Y axis is the value of the number in the vector • Represents magnitude

11-755/18-797 31

frequency

Sqrt

(energ

y)

frequency frequency 1...1540.911 1.14.16..24.3 0.13.03.0.0

C E C2

Vector dot product

• How much of C is also in E – How much can you fake a C by playing an E

– C.E / |C||E| = 0.1

– Not very much

• How much of C is in C2? – C.C2 / |C| /|C2| = 0.5

– Not bad, you can fake it

• To do this, C, E, and C2 must be the same size 11-755/18-797 32

frequency

Sqrt

(energ

y)

frequency frequency 1...1540.911 1.14.16..24.3 0.13.03.0.0

C E C2

Vector outer product

• The column vector is the spectrum

• The row vector is an amplitude modulation

• The outer product is a spectrogram

– Shows how the energy in each frequency varies with time

– The pattern in each column is a scaled version of the spectrum

– Each row is a scaled version of the modulation

11-755/18-797 33

Multiplying a matrix by a scalar

• Multiplying a matrix by a scalar multiplies every element of the matrix

• Note: bA = Ab

11-755/18-797 34

34333231

24232221

14131211

34333231

24232221

14131211

babababa

babababa

babababa

aaaa

aaaa

aaaa

bbA

Multiplying a vector by a matrix

• Multiplying a vector by a matrix transforms the vector

– Dimensions must match!!

• No. of columns of matrix = size of vector

• Result inherits the number of rows from the matrix

11-755/18-797 35

333232131

323222121

313212111

4

3

2

1

34333231

24232221

14131211

bababa

bababa

bababa

b

b

b

b

aaaa

aaaa

aaaa

BA


• Generalization of vector scaling

– Left multiplication: Dot product of each vector

pair




11-755/18-797 36

ba

bab

a

aBA

2

1

2

1

cd

bd

ad

d

c

b

a

.


• Generalization of vector multiplication

– Right multiplication: Dot product of each vector pair




11-755/18-797 37

2121 .. bababbaBA

dcdbdacbad .

Matrix Multiplication: Column space

• Right multiplication by a matrix transform a row space

vector to a column space vector

• It mixes the column vectors of the matrix using the

numbers in the vector

• The column space of the Matrix is the complete set of

all vectors that can be formed by mixing its columns

11-755/18-797 38

f

cz

e

by

d

ax

z

y

x

fed

cba

Matrix Multiplication: Row space

• Left multiplication mixes the row vectors of the matrix. – Converts a vector in the column space to one in the

row space

• The row space of the Matrix is the complete set of all vectors that can be formed by mixing its rows

11-755/18-797 39

fedycbaxfed

cbayx

Multiplication of vector space by matrix

• The matrix rotates and scales the space

– Including its own vectors

11-755/18-797 40

6.13.1

7.03.0Y

Row space Column space

Multiplication of vector space by matrix

• The normals to the row vectors in the matrix become the new axes – X axis = normal to the second row vector

• Scaled by the inverse of the length of the first row vector 11-755/18-797 41

6.13.1

7.03.0Y

Matrix Multiplication

• The k-th axis corresponds to the normal to the hyperplane represented by the 1..k-1,k+1..N-th row vectors in the matrix

– Any set of K-1 vectors represent a hyperplane of dimension K-1 or less

• The distance along the new axis equals the length of the projection on the k-th row vector

– Expressed in inverse-lengths of the vector

11-755/18-797 42

ifc

heb

gda

Matrix multiplication: Mixing vectors

• A physical example

– The three column vectors of the matrix X are the spectra of three notes

– The multiplying column vector Y is just a mixing vector

– The result is a sound that is the mixture of the three notes

11-755/18-797 43

1..

.249

0..

031

X

1

2

1

Y

2

.

.

7

=

Matrix multiplication: Mixing vectors

• Mixing two images – The images are arranged as columns

• position value not included

– The result of the multiplication is rearranged as an image 11-755/18-797 44

200 x 200 200 x 200 200 x 200

40000 x 2

75.0

25.0

40000 x 1

2 x 1

Multiplying matrices

• Simple vector multiplication: Vector outer product

11-755/18-797 45

2212

2111

21

2

1

baba

bababb

a

aab

Multiplying matrices

• Generalization of vector multiplication

– Outer product of dot products!!


• Columns of first matrix = rows of second

• Result inherits the number of rows from the first matrix and the number of columns from the second matrix

11-755/18-797 46

A B a1

a2

b1 b2

a1 b1 a1 b2

a2 b1 a2 b2

Multiplying matrices: Another view

• Simple vector multiplication: Vector inner product

11-755/18-797 47

2211

2

1

21 babab

baa

ab

Matrix multiplication: another view

11-755/18-797 48

NKN

MN

N

K

M

K

M

NKN

NK

MNM

N

N

bb

a

a

bb

a

a

bb

a

a

bb

bb

aa

aa

aa

..

.....

.

..

.

.

.

...

.

..

....

..

..

1

1

221

2

12

111

1

11

1

11

1

221

111

The outer product of the first column of A and the first row of B + outer product of the second column of A and the second row of B + ….

Sum of outer products

2222

2

1

21 . babab

baaBA

Why is that useful?

• Sounds: Three notes modulated independently

11-755/18-797 49

1..

.249

0..

031

X

.....195.09.08.07.06.05.0

......5.005.07.09.01

.....05.075.0175.05.00

Y

Matrix multiplication: Mixing modulated spectra


11-755/18-797 50

1..

.249

0..

031

X

.....195.09.08.07.06.05.0

......5.005.07.09.01

.....05.075.0175.05.00

Y



11-755/18-797 51

1..

.249

0..

031

X

.....195.09.08.07.06.05.0

......5.005.07.09.01

.....05.075.0175.05.00

Y



11-755/18-797 52

.....195.09.08.07.06.05.0

......5.005.07.09.01

.....05.075.0175.05.00

1..

.249

0..

031

X



11-755/18-797 53

.....195.09.08.07.06.05.0

......5.005.07.09.01

.....05.075.0175.05.00

1..

.249

0..

031

X



11-755/18-797 54

Matrix multiplication: Image transition

• Image1 fades out linearly

• Image 2 fades in linearly

11-755/18-797 55

..

..

..

22

11

ji

ji

19.8.7.6.5.4.3.2.1.0

01.2.3.4.5.6.7.8.9.1


• Each column is one image – The columns represent a sequence of images of decreasing

intensity


11-755/18-797 56

19.8.7.6.5.4.3.2.1.0

01.2.3.4.5.6.7.8.9.1

..

..

..

22

11

ji

ji

0......8.09.0

.0.........

0.........

0......8.09.0

0......8.09.0

222

111

NNN iii

iii

iii



11-755/18-797 57

19.8.7.6.5.4.3.2.1.0

01.2.3.4.5.6.7.8.9.1

..

..

..

22

11

ji

ji




11-755/18-797 58

..

..

..

22

11

ji

ji

19.8.7.6.5.4.3.2.1.0

01.2.3.4.5.6.7.8.9.1

Matrix Operations: Properties

• A+B = B+A

– Actual interpretation: for any vector x

• (A+B)x = (B+A)x (column vector x of the right size)

• x(A+B) = x(B+A) (row vector x of the appropriate size)

• A + (B + C) = (A + B) + C

• AB != BA

11-755/18-797 59

The Space of Matrices

• The set of all matrices of a given size (e.g. all 3x4 matrices) is a space!

– Addition is closed

– Scalar multiplication is closed

– Zero matrix exists

– Matrices have additive inverses

– Associativity and commutativity rules apply!

11-755/18-797 60

The Identity Matrix

• An identity matrix is a square matrix where

– All diagonal elements are 1.0

– All off-diagonal elements are 0.0

• Multiplication by an identity matrix does not change vectors

11-755/18-797 61

10

01Y

Diagonal Matrix

• All off-diagonal elements are zero

• Diagonal elements are non-zero

• Scales the axes – May flip axes

11-755/18-797 62

10

02Y

Diagonal matrix to transform images

• How?

11-755/18-797 63

Stretching

1.1.00.1.11

10.10.65.1.21

10.2.22.2.11

100

010

002

• Location-based representation

• Scaling matrix – only scales the X axis

– The Y axis and pixel value are scaled by identity

• Not a good way of scaling.

11-755/18-797 64

Stretching

)2x(

Newpic

.....

.0000

.5.000

.5.15.0

.005.1

NN

DA

A

• Better way

• Interpolate

11-755/18-797 65

D =

N is the width

of the original

image

Modifying color

• Scale only Green

100

020

001

PNewpic

BGR

P

11-755/18-797 66

Permutation Matrix

• A permutation matrix simply rearranges the axes

– The row entries are axis vectors in a different order

– The result is a combination of rotations and reflections

• The permutation matrix effectively permutes the arrangement of the elements in a vector

11-755/18-797 67

x

z

y

z

y

x

001

100

010

3

4

5

(3,4,5)

X

Y

Z

4

5

3

X (old Y)

Y (old Z)

Z (old X)

Permutation Matrix

• Reflections and 90 degree rotations of images and objects

11-755/18-797 68

100

001

010

P

001

100

010

P

1.1.00.1.11

10.10.65.1.21

10.2.22.2.11

Permutation Matrix

• Reflections and 90 degree rotations of images and objects

– Object represented as a matrix of 3-Dimensional “position” vectors

– Positions identify each point on the surface

11-755/18-797 69

100

001

010

P

001

100

010

P

N

N

N

zzz

yyy

xxx

..

..

..

21

21

21

Rotation Matrix

• A rotation matrix rotates the vector by some angle q

• Alternately viewed, it rotates the axes

– The new axes are at an angle q to the old one

11-755/18-797 70

'

'

cossin

sincos

y

xX

y

xX

new

qq

qqqR

X

Y

(x,y)

newXXR q

X

Y

(x,y)

(x’,y’)

qq

qq

cossin'

sincos'

yxy

yxx

x’ x

y’

y q

Rotating a picture

• Note the representation: 3-row matrix – Rotation only applies on the “coordinate” rows – The value does not change – Why is pacman grainy?

11-755/18-797 71

1.1.00.1.11

..10.65.1.21

..2.22.2.11

100

045cos45sin

045sin45cos

R

1.1.00.1.11

..212.2827.23.232

..28.2423.2.20

3-D Rotation

• 2 degrees of freedom

– 2 separate angles

• What will the rotation matrix be?

11-755/18-797 72

X

Y

Z q

Xnew

Ynew

Znew a

Projections

• What would we see if the cone to the left were transparent if we looked at it from above the plane shown by the grid?

– Normal to the plane

– Answer: the figure to the right

• How do we get this? Projection

11-755/18-797 73

Projections

• Actual problem: for each vector

– What is the corresponding vector on the plane that is “closest approximation” to it?

– What is the transform that converts the vector to its approximation on the plane?

11-755/18-797 74

90degrees

projection

Projections

• Arithmetically: Find the matrix P such that

– For every vector X, PX lies on the plane

• The plane is the column space of P

– ||X – PX||2 is the smallest possible

11-755/18-797 75

90degrees

projection

error

Projection Matrix

• Consider any set of independent vectors W1, W2 … on the plane

– Arranged as a matrix [W1 W2 ..]

• The plane is the column space of the matrix

– Any vector can be projected onto this plane

– The matrix P that rotates and scales the vector so that it becomes its projection is a projection matrix

11-755/18-797 76

90degrees

projection W1

W2

Projection Matrix

• Given a set of vectors W1, W2 … which form a matrix W = [W1 W2.. ]

• The projection matrix to transform a vector X to its projection on the plane is

– P = W (WTW)-1 WT

• We will visit matrix inversion shortly

• Magic – any set of independent vectors from the same plane that are expressed as a matrix will give you the same projection matrix

– P = V (VTV)-1 VT

11-755/18-797 77

90degrees

projection W1

W2

Projections

• HOW?

11-755/18-797 78

Projections

• Draw any two vectors W1 and W1 W2 that lie on the plane

– ANY two so long as they have different angles

• Compose a matrix W = [W1 W2.. ]

• Compose the projection matrix P = W (WTW)-1 WT

• Multiply every point on the cone by P to get its projection

11-755/18-797 79

Projections

• The projection actually projects it onto the plane, but you’re still seeing the plane in 3D

– The result of the projection is a 3-D vector

– P = W (WTW)-1 WT = 3x3, PX = 3x1

– The image must be rotated till the plane is in the plane of the paper

• The Z axis in this case will always be zero and can be ignored

• How will you rotate it? (remember you know W1 and W2 )

11-755/18-797 80

Projection matrix properties

• The projection of any vector that is already on the plane is the vector itself

– PX = X if X is on the plane

– If the object is already on the plane, there is no further projection to be performed

• The projection of a projection is the projection

– P(PX) = PX

• Projection matrices are idempotent

– P2 = P 81 11-755/18-797

Projections: A more physical meaning

• Let W1, W2 .. Wk be “bases”

• We want to explain our data in terms of these “bases”

– We often cannot do so

– But we can explain a significant portion of it

• The portion of the data that can be expressed in terms of our vectors W1, W2, .. Wk, is the projection of the data on the W1 .. Wk (hyper) plane

– In our previous example, the “data” were all the points on a cone, and the bases were vectors on the plane

11-755/18-797 82

Projection : an example with sounds

• The spectrogram (matrix) of a piece of music

11-755/18-797 83

How much of the above music was composed of the above notes I.e. how much can it be explained by the notes

Projection: one note


11-755/18-797 84

M = spectrogram; W = note

P = W (WTW)-1 WT

Projected Spectrogram = P * M

M =

W =

Projection: one note – cleaned up


11-755/18-797 85

Floored all matrix values below a threshold to zero

M =

W =

Projection: multiple notes


11-755/18-797 86

P = W (WTW)-1 WT


M =

W =

Projection: multiple notes, cleaned up


11-755/18-797 87

P = W (WTW)-1 WT


M =

W =

Projection and Least Squares • Projection actually computes a least squared error estimate

• For each vector V in the music spectrogram matrix – Approximation: Vapprox = a*note1 + b*note2 + c*note3..

– Error vector E = V – Vapprox

– Squared error energy for V e(V) = norm(E)2

– Total error = sum over all V { e(V) } = SV e(V)

• Projection computes Vapprox for all vectors such that Total error is minimized – It does not give you “a”, “b”, “c”.. Though

• That needs a different operation – the inverse / pseudo inverse 11-755/18-797 88

c

b

a

Vapprox

note

1

note

2

note

3

machine learning for signal...

Documents