machine learning for signal...
TRANSCRIPT
Machine Learning for Signal Processing
Fundamentals of Linear Algebra
Class 2. 6 Sep 2016
Instructor: Bhiksha Raj
11-755/18-797 1
Overview
• Vectors and matrices
• Basic vector/matrix operations
• Various matrix types
• Projections
11-755/18-797 2
Book
• Fundamentals of Linear Algebra, Gilbert Strang
• Important to be very comfortable with linear algebra – Appears repeatedly in the form of Eigen analysis, SVD, Factor
analysis
– Appears through various properties of matrices that are used in machine learning
– Often used in the processing of data of various kinds
– Will use sound and images as examples
• Today’s lecture: Definitions – Very small subset of all that’s used
– Important subset, intended to help you recollect
11-755/18-797 3
Incentive to use linear algebra
• Simplified notation!
• Easier intuition
– Really convenient geometric interpretations
• Easy code translation!
11-755/18-797 4
for i=1:n
for j=1:m
c(i)=c(i)+y(j)*x(i)*a(i,j)
end
end
C=x*A*y
y j x iaiji
j
yAx T
And other things you can do
• Manipulate Data
• Extract information from data
• Represent data..
• Etc.
11-755/18-797 5
Rotation + Projection +
Scaling + Perspective
Time F
requ
en
cy
From Bach’s Fugue in Gm
Decomposition (NMF)
Scalars, vectors, matrices, …
• A scalar a is a number – a = 2, a = 3.14, a = -1000, etc.
• A vector a is a linear arrangement of a collection of scalars
• A matrix A is a rectangular arrangement of a collection of scalars
11-755/18-797 6
511.3
62.21A
a 1 2 3 , a 3.14
32
Vectors in the abstract • Ordered collection of numbers
– Examples: [3 4 5], [a b c d], ..
– [3 4 5] != [4 3 5] Order is important
• Typically viewed as identifying (the path from origin to) a location in an N-dimensional space
11-755/18-797 7
x
z
y
3
4
5
(3,4,5)
(4,3,5)
Vectors in reality • Vectors usually hold sets of
numerical attributes
– X, Y, Z coordinates
• [1, 2, 0]
– [height(cm) weight(kg)]
– [175 72]
– A location in Manhattan
• [3av 33st]
• A series of daily temperatures
• Samples in an audio signal
• Etc.
11-755/18-797 8
[-2.5av 6st]
[2av 4st]
[1av 8st]
Vector norm
• Measure of how long a vector is:
– Represented as
• Geometrically the shortest distance to travel from the origin to the destination
– As the crow flies
– Assuming Euclidean Geometry
11-755/18-797 9
[-2av 17st] [-6av 10st]
a b ... a2 b2 ...2
x
3
4
5
(3,4,5) Length = sqrt(32 + 42 + 52)
Vector Operations: Multiplication by scalar
• Vector multiplication by scalar: each component multiplied by scalar
– 2.5 x (3,4,5) = (7.5, 10, 12.5)
• Note: as a result, vector norm is also multiplied by the scalar
– ||2.5 x (3,4,5)|| = 2.5x|| (3, 4, 5)||
11-755/18-797 10
3
(3,4,5)
(7.5, 10, 12.5)
Multiplication by scalar “stretches” the vector
Vector Operations: Addition
• Vector addition: individual components add
– (3,4,5) + (3,-2,-3) = (6,2,2)
– Only applicable if both vectors are the same size 11-755/18-797 11
3
4
5
(3,4,5) 3
-2
-3
(3,-2,-3)
(6,2,2)
An introduction to spaces
• Conventional notion of “space”: a geometric construct of a certain number of “dimensions”
– E.g. the 3-D space that this room and every object in it lives in
11-755/18-797 12
A vector space
• A vector space is an infinitely large set of vectors the following properties
– The set includes the zero vector (of all zeros)
– The set is “closed” under addition • If X and Y are in the set, aX + bY is also in the set for any two
scalars a and b
– For every X in the set, the set also includes the additive inverse Y = -X, such that X + Y = 0
11-755/18-797 13
Additional Properties
• Additional requirements:
– Scalar multiplicative identity element exists: 1X = X
– Addition is associative: X + Y = Y + X
– Addition is commutative: (X+Y)+Z = X+(Y+Z)
– Scalar multiplication is commutative: a(bX) = (ab) X
– Scalar multiplication is distributive: (a+b)X = aX + bX a(X+Y) = aX + aY
11-755/18-797 14
Example of vector space
• Set of all three-component column vectors
– Note we used the term three-component, rather than three-dimensional
• The set includes the zero vector
• For every X in the set 𝛼 ∈ ℛ, every aX is in the set
• For every X, Y in the set, aX + bY is in the set
• -X is in the set
• Etc.
11-755/18-797 15
Example: a function space
11-755/18-797 16
Dimension of a space
• Every element in the space can be composed
of linear combinations of some other
elements in the space
– For any X in S we can write X = aY1 + bY2 + cY3..
for some other Y1, Y2, Y3 .. in S
• Trivial to prove..
11-755/18-797 17
Dimension of a space
• What is the smallest subset of elements that can compose the entire set?
– There may be multiple such sets
• The elements in this set are called “bases”
– The set is a “basis” set
• The number of elements in the set is the “dimensionality” of the space
11-755/18-797 18
Dimensions: Example
• What is the dimensionality of this vector space
11-755/18-797 19
Dimensions: Example
• What is the dimensionality of this vector space?
– First confirm this is a proper vector space
• Note: all elements in Z are also in S (slide 19)
– Z is a subspace of S
11-755/18-797 20
Dimensions: Example
• What is the dimensionality of this space?
11-755/18-797 21
Moving on….
11-755/18-797 22
Interpreting Matrices • Two interpretations of a matrix
• As a transform that modifies vectors and vector spaces
• As a container for data (vectors)
• In the next two classes we’ll focus on the first view
• But we will mostly consider the second view for ML algorithms in the our discussions of signal representations
11-755/18-797 23
dc
ba
onmlk
jihgf
edcba
Interpreting Matrices as collections of vectors
• A matrix can be vertical stacking of row vectors
– The space of all vectors that can be composed from the rows of the matrix is the row space of the matrix
• Or a horizontal arrangement of column vectors
– The space of all vectors that can be composed from the columns of the matrix is the column space of the matrix
11-755/18-797 24
fedcba
R
fedcba
R
Dimensions of a matrix
• The matrix size is specified by the number of rows and columns
– c = 3x1 matrix: 3 rows and 1 column
– r = 1x3 matrix: 1 row and 3 columns
– S = 2 x 2 matrix
– R = 2 x 3 matrix
– Pacman = 321 x 399 matrix
11-755/18-797 25
cba
c
b
a
rc ,
fed
cba
dc
baRS ,
Representing an image as a matrix • 3 pacmen
• A 321 x 399 matrix
– Row and Column = position
• A 3 x 128079 matrix
– Triples of x,y and value
• A 1 x 128079 vector
– “Unraveling” the matrix
• Note: All of these can be recast as the matrix that forms the image
– Representations 2 and 4 are equivalent
• The position is not represented
11-755/18-797 26
1.1.00.1.11
10.10.65.1.21
10.2.22.2.11
1..000.11.11
Y
X
v
Values only; X and Y are
implicit
Basic arithmetic operations
• Addition and subtraction (vectors and matrices)
– Element-wise operations
11-755/18-797 27
a b
a1
a2
a3
b1
b2
b3
a1 b1
a2 b2
a3 b3
a b
a1
a2
a3
b1
b2
b3
a1 b1
a2 b2
a3 b3
A B a11 a12
a21 a22
b11 b12
b21 b22
a11 b11 a12 b12
a21 b21 a22 b22
Vector/Matrix Transposition
• A transposed row vector becomes a column (and vice versa)
• A transposed matrix gets all its row (or column) vectors transposed in order
11-755/18-797 28
x
a
b
c
, xT a b c
X a b c
d e f
, X
T
a d
b e
c f
y a b c , yT a
b
c
M
, MT
Vector multiplication • Multiplication by scalar
• Dot product, or inner product
– Vectors must have the same number of elements
– Row vector times column vector = scalar
• Outer product or vector direct product
– Column vector times row vector = matrix
11-755/18-797 29
a
b
c
d e f a d a e a f
b d b e b f
c d c e c f
a b c d
e
f
a d b e c f
dcdbdacbad
cd
bd
ad
d
c
b
a
.
Vector dot product • Example:
– Coordinates are yards, not ave/st
– a = [200 1600],
b = [770 300]
• The dot product of the two vectors relates to the length of a projection
– How much of the first vector have we covered by following the second one?
– Must normalize by the length of the “target” vector
11-755/18-797 30
[200yd 1600yd]
norm ≈ 1612
[770yd 300yd]
norm ≈ 826
a bT
a
200 1600 770
300
200 1600 393yd
norm
≈ 393yd
Vector dot product
• Vectors are spectra – Energy at a discrete set of frequencies
– Actually 1 x 4096
– X axis is the index of the number in the vector • Represents frequency
– Y axis is the value of the number in the vector • Represents magnitude
11-755/18-797 31
frequency
Sqrt
(energ
y)
frequency frequency 1...1540.911 1.14.16..24.3 0.13.03.0.0
C E C2
Vector dot product
• How much of C is also in E – How much can you fake a C by playing an E
– C.E / |C||E| = 0.1
– Not very much
• How much of C is in C2? – C.C2 / |C| /|C2| = 0.5
– Not bad, you can fake it
• To do this, C, E, and C2 must be the same size 11-755/18-797 32
frequency
Sqrt
(energ
y)
frequency frequency 1...1540.911 1.14.16..24.3 0.13.03.0.0
C E C2
Vector outer product
• The column vector is the spectrum
• The row vector is an amplitude modulation
• The outer product is a spectrogram
– Shows how the energy in each frequency varies with time
– The pattern in each column is a scaled version of the spectrum
– Each row is a scaled version of the modulation
11-755/18-797 33
Multiplying a matrix by a scalar
• Multiplying a matrix by a scalar multiplies every element of the matrix
• Note: bA = Ab
11-755/18-797 34
34333231
24232221
14131211
34333231
24232221
14131211
babababa
babababa
babababa
aaaa
aaaa
aaaa
bbA
Multiplying a vector by a matrix
• Multiplying a vector by a matrix transforms the vector
– Dimensions must match!!
• No. of columns of matrix = size of vector
• Result inherits the number of rows from the matrix
11-755/18-797 35
333232131
323222121
313212111
4
3
2
1
34333231
24232221
14131211
bababa
bababa
bababa
b
b
b
b
aaaa
aaaa
aaaa
BA
Multiplying a vector by a matrix
• Generalization of vector scaling
– Left multiplication: Dot product of each vector
pair
– Dimensions must match!!
• No. of columns of matrix = size of vector
• Result inherits the number of rows from the matrix
11-755/18-797 36
ba
bab
a
aBA
2
1
2
1
cd
bd
ad
d
c
b
a
.
Multiplying a vector by a matrix
• Generalization of vector multiplication
– Right multiplication: Dot product of each vector pair
– Dimensions must match!!
• No. of columns of matrix = size of vector
• Result inherits the number of rows from the matrix
11-755/18-797 37
2121 .. bababbaBA
dcdbdacbad .
Matrix Multiplication: Column space
• Right multiplication by a matrix transform a row space
vector to a column space vector
• It mixes the column vectors of the matrix using the
numbers in the vector
• The column space of the Matrix is the complete set of
all vectors that can be formed by mixing its columns
11-755/18-797 38
f
cz
e
by
d
ax
z
y
x
fed
cba
Matrix Multiplication: Row space
• Left multiplication mixes the row vectors of the matrix. – Converts a vector in the column space to one in the
row space
• The row space of the Matrix is the complete set of all vectors that can be formed by mixing its rows
11-755/18-797 39
fedycbaxfed
cbayx
Multiplication of vector space by matrix
• The matrix rotates and scales the space
– Including its own vectors
11-755/18-797 40
6.13.1
7.03.0Y
Row space Column space
Multiplication of vector space by matrix
• The normals to the row vectors in the matrix become the new axes – X axis = normal to the second row vector
• Scaled by the inverse of the length of the first row vector 11-755/18-797 41
6.13.1
7.03.0Y
Matrix Multiplication
• The k-th axis corresponds to the normal to the hyperplane represented by the 1..k-1,k+1..N-th row vectors in the matrix
– Any set of K-1 vectors represent a hyperplane of dimension K-1 or less
• The distance along the new axis equals the length of the projection on the k-th row vector
– Expressed in inverse-lengths of the vector
11-755/18-797 42
ifc
heb
gda
Matrix multiplication: Mixing vectors
• A physical example
– The three column vectors of the matrix X are the spectra of three notes
– The multiplying column vector Y is just a mixing vector
– The result is a sound that is the mixture of the three notes
11-755/18-797 43
1..
.249
0..
031
X
1
2
1
Y
2
.
.
7
=
Matrix multiplication: Mixing vectors
• Mixing two images – The images are arranged as columns
• position value not included
– The result of the multiplication is rearranged as an image 11-755/18-797 44
200 x 200 200 x 200 200 x 200
40000 x 2
75.0
25.0
40000 x 1
2 x 1
Multiplying matrices
• Simple vector multiplication: Vector outer product
11-755/18-797 45
2212
2111
21
2
1
baba
bababb
a
aab
Multiplying matrices
• Generalization of vector multiplication
– Outer product of dot products!!
– Dimensions must match!!
• Columns of first matrix = rows of second
• Result inherits the number of rows from the first matrix and the number of columns from the second matrix
11-755/18-797 46
A B a1
a2
b1 b2
a1 b1 a1 b2
a2 b1 a2 b2
Multiplying matrices: Another view
• Simple vector multiplication: Vector inner product
11-755/18-797 47
2211
2
1
21 babab
baa
ab
Matrix multiplication: another view
11-755/18-797 48
NKN
MN
N
K
M
K
M
NKN
NK
MNM
N
N
bb
a
a
bb
a
a
bb
a
a
bb
bb
aa
aa
aa
..
.....
.
..
.
.
.
...
.
..
....
..
..
1
1
221
2
12
111
1
11
1
11
1
221
111
The outer product of the first column of A and the first row of B + outer product of the second column of A and the second row of B + ….
Sum of outer products
2222
2
1
21 . babab
baaBA
Why is that useful?
• Sounds: Three notes modulated independently
11-755/18-797 49
1..
.249
0..
031
X
.....195.09.08.07.06.05.0
......5.005.07.09.01
.....05.075.0175.05.00
Y
Matrix multiplication: Mixing modulated spectra
• Sounds: Three notes modulated independently
11-755/18-797 50
1..
.249
0..
031
X
.....195.09.08.07.06.05.0
......5.005.07.09.01
.....05.075.0175.05.00
Y
Matrix multiplication: Mixing modulated spectra
• Sounds: Three notes modulated independently
11-755/18-797 51
1..
.249
0..
031
X
.....195.09.08.07.06.05.0
......5.005.07.09.01
.....05.075.0175.05.00
Y
Matrix multiplication: Mixing modulated spectra
• Sounds: Three notes modulated independently
11-755/18-797 52
.....195.09.08.07.06.05.0
......5.005.07.09.01
.....05.075.0175.05.00
1..
.249
0..
031
X
Matrix multiplication: Mixing modulated spectra
• Sounds: Three notes modulated independently
11-755/18-797 53
.....195.09.08.07.06.05.0
......5.005.07.09.01
.....05.075.0175.05.00
1..
.249
0..
031
X
Matrix multiplication: Mixing modulated spectra
• Sounds: Three notes modulated independently
11-755/18-797 54
Matrix multiplication: Image transition
• Image1 fades out linearly
• Image 2 fades in linearly
11-755/18-797 55
..
..
..
22
11
ji
ji
19.8.7.6.5.4.3.2.1.0
01.2.3.4.5.6.7.8.9.1
Matrix multiplication: Image transition
• Each column is one image – The columns represent a sequence of images of decreasing
intensity
• Image1 fades out linearly
11-755/18-797 56
19.8.7.6.5.4.3.2.1.0
01.2.3.4.5.6.7.8.9.1
..
..
..
22
11
ji
ji
0......8.09.0
.0.........
0.........
0......8.09.0
0......8.09.0
222
111
NNN iii
iii
iii
Matrix multiplication: Image transition
• Image 2 fades in linearly
11-755/18-797 57
19.8.7.6.5.4.3.2.1.0
01.2.3.4.5.6.7.8.9.1
..
..
..
22
11
ji
ji
Matrix multiplication: Image transition
• Image1 fades out linearly
• Image 2 fades in linearly
11-755/18-797 58
..
..
..
22
11
ji
ji
19.8.7.6.5.4.3.2.1.0
01.2.3.4.5.6.7.8.9.1
Matrix Operations: Properties
• A+B = B+A
– Actual interpretation: for any vector x
• (A+B)x = (B+A)x (column vector x of the right size)
• x(A+B) = x(B+A) (row vector x of the appropriate size)
• A + (B + C) = (A + B) + C
• AB != BA
11-755/18-797 59
The Space of Matrices
• The set of all matrices of a given size (e.g. all 3x4 matrices) is a space!
– Addition is closed
– Scalar multiplication is closed
– Zero matrix exists
– Matrices have additive inverses
– Associativity and commutativity rules apply!
11-755/18-797 60
The Identity Matrix
• An identity matrix is a square matrix where
– All diagonal elements are 1.0
– All off-diagonal elements are 0.0
• Multiplication by an identity matrix does not change vectors
11-755/18-797 61
10
01Y
Diagonal Matrix
• All off-diagonal elements are zero
• Diagonal elements are non-zero
• Scales the axes – May flip axes
11-755/18-797 62
10
02Y
Diagonal matrix to transform images
• How?
11-755/18-797 63
Stretching
1.1.00.1.11
10.10.65.1.21
10.2.22.2.11
100
010
002
• Location-based representation
• Scaling matrix – only scales the X axis
– The Y axis and pixel value are scaled by identity
• Not a good way of scaling.
11-755/18-797 64
Stretching
)2x(
Newpic
.....
.0000
.5.000
.5.15.0
.005.1
NN
DA
A
• Better way
• Interpolate
11-755/18-797 65
D =
N is the width
of the original
image
Modifying color
• Scale only Green
100
020
001
PNewpic
BGR
P
11-755/18-797 66
Permutation Matrix
• A permutation matrix simply rearranges the axes
– The row entries are axis vectors in a different order
– The result is a combination of rotations and reflections
• The permutation matrix effectively permutes the arrangement of the elements in a vector
11-755/18-797 67
x
z
y
z
y
x
001
100
010
3
4
5
(3,4,5)
X
Y
Z
4
5
3
X (old Y)
Y (old Z)
Z (old X)
Permutation Matrix
• Reflections and 90 degree rotations of images and objects
11-755/18-797 68
100
001
010
P
001
100
010
P
1.1.00.1.11
10.10.65.1.21
10.2.22.2.11
Permutation Matrix
• Reflections and 90 degree rotations of images and objects
– Object represented as a matrix of 3-Dimensional “position” vectors
– Positions identify each point on the surface
11-755/18-797 69
100
001
010
P
001
100
010
P
N
N
N
zzz
yyy
xxx
..
..
..
21
21
21
Rotation Matrix
• A rotation matrix rotates the vector by some angle q
• Alternately viewed, it rotates the axes
– The new axes are at an angle q to the old one
11-755/18-797 70
'
'
cossin
sincos
y
xX
y
xX
new
qqqR
X
Y
(x,y)
newXXR q
X
Y
(x,y)
(x’,y’)
cossin'
sincos'
yxy
yxx
x’ x
y’
y q
Rotating a picture
• Note the representation: 3-row matrix – Rotation only applies on the “coordinate” rows – The value does not change – Why is pacman grainy?
11-755/18-797 71
1.1.00.1.11
..10.65.1.21
..2.22.2.11
100
045cos45sin
045sin45cos
R
1.1.00.1.11
..212.2827.23.232
..28.2423.2.20
3-D Rotation
• 2 degrees of freedom
– 2 separate angles
• What will the rotation matrix be?
11-755/18-797 72
X
Y
Z q
Xnew
Ynew
Znew a
Projections
• What would we see if the cone to the left were transparent if we looked at it from above the plane shown by the grid?
– Normal to the plane
– Answer: the figure to the right
• How do we get this? Projection
11-755/18-797 73
Projections
• Actual problem: for each vector
– What is the corresponding vector on the plane that is “closest approximation” to it?
– What is the transform that converts the vector to its approximation on the plane?
11-755/18-797 74
90degrees
projection
Projections
• Arithmetically: Find the matrix P such that
– For every vector X, PX lies on the plane
• The plane is the column space of P
– ||X – PX||2 is the smallest possible
11-755/18-797 75
90degrees
projection
error
Projection Matrix
• Consider any set of independent vectors W1, W2 … on the plane
– Arranged as a matrix [W1 W2 ..]
• The plane is the column space of the matrix
– Any vector can be projected onto this plane
– The matrix P that rotates and scales the vector so that it becomes its projection is a projection matrix
11-755/18-797 76
90degrees
projection W1
W2
Projection Matrix
• Given a set of vectors W1, W2 … which form a matrix W = [W1 W2.. ]
• The projection matrix to transform a vector X to its projection on the plane is
– P = W (WTW)-1 WT
• We will visit matrix inversion shortly
• Magic – any set of independent vectors from the same plane that are expressed as a matrix will give you the same projection matrix
– P = V (VTV)-1 VT
11-755/18-797 77
90degrees
projection W1
W2
Projections
• HOW?
11-755/18-797 78
Projections
• Draw any two vectors W1 and W1 W2 that lie on the plane
– ANY two so long as they have different angles
• Compose a matrix W = [W1 W2.. ]
• Compose the projection matrix P = W (WTW)-1 WT
• Multiply every point on the cone by P to get its projection
11-755/18-797 79
Projections
• The projection actually projects it onto the plane, but you’re still seeing the plane in 3D
– The result of the projection is a 3-D vector
– P = W (WTW)-1 WT = 3x3, PX = 3x1
– The image must be rotated till the plane is in the plane of the paper
• The Z axis in this case will always be zero and can be ignored
• How will you rotate it? (remember you know W1 and W2 )
11-755/18-797 80
Projection matrix properties
• The projection of any vector that is already on the plane is the vector itself
– PX = X if X is on the plane
– If the object is already on the plane, there is no further projection to be performed
• The projection of a projection is the projection
– P(PX) = PX
• Projection matrices are idempotent
– P2 = P 81 11-755/18-797
Projections: A more physical meaning
• Let W1, W2 .. Wk be “bases”
• We want to explain our data in terms of these “bases”
– We often cannot do so
– But we can explain a significant portion of it
• The portion of the data that can be expressed in terms of our vectors W1, W2, .. Wk, is the projection of the data on the W1 .. Wk (hyper) plane
– In our previous example, the “data” were all the points on a cone, and the bases were vectors on the plane
11-755/18-797 82
Projection : an example with sounds
• The spectrogram (matrix) of a piece of music
11-755/18-797 83
How much of the above music was composed of the above notes I.e. how much can it be explained by the notes
Projection: one note
• The spectrogram (matrix) of a piece of music
11-755/18-797 84
M = spectrogram; W = note
P = W (WTW)-1 WT
Projected Spectrogram = P * M
M =
W =
Projection: one note – cleaned up
• The spectrogram (matrix) of a piece of music
11-755/18-797 85
Floored all matrix values below a threshold to zero
M =
W =
Projection: multiple notes
• The spectrogram (matrix) of a piece of music
11-755/18-797 86
P = W (WTW)-1 WT
Projected Spectrogram = P * M
M =
W =
Projection: multiple notes, cleaned up
• The spectrogram (matrix) of a piece of music
11-755/18-797 87
P = W (WTW)-1 WT
Projected Spectrogram = P * M
M =
W =
Projection and Least Squares • Projection actually computes a least squared error estimate
• For each vector V in the music spectrogram matrix – Approximation: Vapprox = a*note1 + b*note2 + c*note3..
– Error vector E = V – Vapprox
– Squared error energy for V e(V) = norm(E)2
– Total error = sum over all V { e(V) } = SV e(V)
• Projection computes Vapprox for all vectors such that Total error is minimized – It does not give you “a”, “b”, “c”.. Though
• That needs a different operation – the inverse / pseudo inverse 11-755/18-797 88
c
b
a
Vapprox
note
1
note
2
note
3