lecture notes on linear algebra - ornela mulita

Lecture Notes on Linear Algebra

Ornela MulitaMaster Degree in DSSC - Data Science & Scientific Computing

University of Trieste

September 2019

Preface

This text contains the lecture notes for an introductory course on linear algebragiven in Fall, 2018 and 2019 to the students of the Master Degree in Data Science& Scientific Computing (DSSC) at the University of Trieste.

They correspond to a first course on linear algebra, which does not rely onany prerequisite, and can be considered as preceding more advanced numericallinear algebra topics in the succeeding courses on DSSC. All necessary notionsare introduced from scratch, and several examples and exercises are provided withdetailed explanations.

The main scope of the course is to emphasize the basic concepts of linear alge-bra (like vector spaces, matrices, linear transformations, scalar products, norms,eigenpairs etc.) as mathematical structures that are used to model the worldaround us. Once “persuaded” of this truth, students can perform computationsin their future studies while at the same time understand both the abstract andconcrete concepts behind these techniques.

The material in these notes is the result of careful elaboration and adaptionfrom several classical books on linear algebra. The main sources of inspirationhave been the books of S. Lang [1], G.H. Golub and C.F. Van Loan [2], D.C. Lay,S.R. Lay and J.J. McDonald [3], C.D. Meyer [4], G. Strang [5], A. Quarteroni,R. Sacco and F. Saleri [6].

Comprehensibly, it is impossible to provide in-depth coverage of all argumentsand examples. Students who wish a more detailed coverage should consult thereferences books, which provide broad coverage of the field.

The notes correspond pretty closely to what I said in class in both academicyears. However, this version has not to be considered definitive, since I aim tocomplete it with further examples and exercises we solved in class. Lastly, I amthankful to the student A. Mecchina for partial typing in LATEX.

3

https://dssc.units.it/

https://dssc.units.it/

Contents

1 Introduction 7

2 Vector spaces 92.1 Vector spaces and subspaces. . . . . . . . . . . . . . . . . . . . . . . 92.2 System of generators, linear dependence and independence of vectors. 142.3 Bases and dimension of a vector space. . . . . . . . . . . . . . . . . 152.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Matrices 193.1 The space of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.5 Matrix transposition . . . . . . . . . . . . . . . . . . . . . . . . . . 263.6 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Linear mappings 334.1 The space of linear mappings . . . . . . . . . . . . . . . . . . . . . 334.2 Image and Kernel of linear mappings . . . . . . . . . . . . . . . . . 354.3 Linear mappings and matrices . . . . . . . . . . . . . . . . . . . . . 36

5 Scalar products, norms and orthogonality 395.1 Scalar products on vector spaces . . . . . . . . . . . . . . . . . . . . 395.2 Norms on vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4 Exercices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Eigenvalues and eigenvectors 516.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . 516.2 Computation of eigenpairs . . . . . . . . . . . . . . . . . . . . . . . 536.3 Diagonalization of a matrix . . . . . . . . . . . . . . . . . . . . . . 57

5

Chapter 1

Introduction

Linear algebra is central to many areas of mathematics. Its influence ranges fromalgebra, through geometry, to applied mathematics. Indeed, some of the oldestand most widespread real-world applications of mathematics derive from linearalgebra. Nowadays, the field of applications that use the language and tools de-veloped in linear algebra has widened steadily and encompasses a vast diversityof problems in multidimensional calculus, differential geometry, functional analy-sis, multivariate statistics, control theory, dynamical systems, optimization, linearprogramming, computer graphics, genetics, graph theory, cryptography etc. Moregenerally, linear algebra is used in most sciences and engineering areas, because itallows modeling many real-world phenomena, and efficiently computing with suchmodels. Specially, linear algebra is the mathematics of data science: matrices andvectors are the language of data.

The first part of this course aims an introduction to both the theory (vectorspaces and linear transformations between them) and the practice (matrices), andmost importantly, the connection between these. Indeed, at the heart of the sub-ject is the understanding and realization that, on finite-dimensional spaces, thetheory of linear transformations is essentially the same as the theory of matrices.This is due primarily to the fundamental fact that the action of a linear trans-formation on a vector is exactly matrix multiplication between the coordinates ofthe transformation and the coordinates of the vector. Therefore linear transfor-mations on finite-dimensional spaces will always have matrix representations, andvice versa.

Very often, in order to quantify errors we need to compute the magnitude ofa vector or a matrix. For that purpose we will study the concepts of vector andmatrix norms, followed by the concepts of inner products and orthogonality. Infact, many geometric structures (lengths, angles, etc.) can all be derived frominner products. These concepts provide powerful geometric tools for solving manyengineering and applied sciences problems. In particular, they play a crucial role

7

in the analysis of matrix algorithms. For instance, they are useful for assessing theaccuracy of computations or for measuring progress during an iteration. Lastly, wewill study the fundamental notions of eigenvalues and eigenvectors, with countlessapplications in real-world problems.

8

Chapter 2

Vector spaces

2.1 Vector spaces and subspaces.

Definition 2.1 (Field). A field K is an algebraic structure consisting of a non-empty set of objects, called scalars (or numbers), in which two operations aredefined

• addition “+”: ∀a, b ∈ K =⇒ a+ b ∈ K;

• multiplication “·”: ∀a, b ∈ K =⇒ a · b ∈ K (often written as ab);

and the following properties are satisfied:

(F1) “+” is commutative: ∀a, b ∈ K =⇒ a+ b = b+ a;

(F2) “+” is associative: ∀a, b ∈ K =⇒ (a+ b) + c = a+ (b+ c);

(F3) there exists an element 0K (called zero or the null element) such that∀a ∈ K =⇒ a+ 0 = 0 + a = a (existence of the additive identity);

(F4) ∀a ∈ K there exists its opposite −a ∈ K such that a+(−a) = (−a)+a = 0K(existence of the additive inverse);

(F5) “·” is commutative: ∀a, b ∈ K =⇒ ab = ba;

(F6) “·” is associative: ∀a, b, c ∈ K =⇒ (a · b)c = a(b · c);

(F7) there exists an element 1K 6= 0K such that ∀a ∈ K, 1K · a = a · 1K = a(existence of the multiplicative identity);

(F8) ∀a 6= 0K ∈ K, there exists its inverse a−1 ∈ K such that a−1a = aa−1 = 1K(existence of the multiplicative inverse);

9

(F9) the distributive property: ∀a, b, c ∈ K =⇒ (a+ b)c = ac+ac and a(b+ c) =ab+ ac.

Remark 2.1. Properties (F1)− (F9) are also called axioms of a field. Using onlythese axioms, one can easily deduce that the additive and multiplicative identityin F are unique, the additive inverse (opposite) of an element of F is unique, andthe multiplicative inverse (or, simply, inverse) of a nonzero element of F is unique.

Remark 2.2. We shall equivalently say that the field K is closed under the oper-ations of addition and multiplication. The essential thing about a field is that itis a set of numbers which can be added and multiplied, where the ordinary rulesof arithmetic hold true and where one can divide by nonzero elements.

Example 2.1 (Examples of fields). Important examples of fields are the field ofrational numbers Q, the field of real numbers R, the field of complex numbers C,and the prime field with p elements Zp (sometimes also written as Fp), where pis any prime number. The set Z of integers is not a field. Indeed, in Z axioms(F1)-(F7) and axiom (F9) all hold, but axiom (F8) does not: the only nonzerointegers that have multiplicative inverses are 1 and −1. For instance, 7 6= 0Z ∈ Zdoes not have a multiplicative inverse 7−1 in Z such that 7 · 7−1 = 1Z, since7−1 /∈ Z. Similarly, the set N of natural numbers is not a field either: not onlynonzero naturals don’t have their multiplicative inverses in N, but also any naturalnumber doesn’t have its opposite in N.

Definition 2.2 (Vector space). A vector space V over the numerical field K isan algebraic structure consisting of a non-empty set of elements, called vectors,in which two operations are defined

• addition “+”: ∀v1,v2 ∈ V =⇒ v1 + v2 ∈ V ;

• multiplication by scalars “·”: ∀a ∈ K,∀v ∈ V =⇒ a · v ∈ V (often writtenas av);

and the following properties are satisfied:

(V1) “+” is commutative: ∀v1,v2 ∈ V =⇒ v1 + v2 = v2 + v1;

(V2) “+” is associative: ∀v1,v2,v3 ∈ V =⇒ (v1 + v2) + v3 = v1 + (v2 + v3);

(V3) there exists an element 0V ∈ V (called zero or the null vector) such that∀v ∈ V =⇒ v + 0V = v;

(V4) ∀v ∈ V there exists its opposite (−v) ∈ V such that v + (−v) = 0V ;

(V5) ∀v ∈ V =⇒ 1K · v = v;

10

(V6) the distributive property: ∀a ∈ K,∀v1,v2 ∈ V =⇒ a(v1 + v2) = av1 + av2;

(V7) the distributive property: ∀a, b ∈ K,∀v ∈ V =⇒ (a+ b)v = av + bv;

(V8) the associative property: ∀a, b ∈ K,∀v ∈ V =⇒ (ab)v = a(bv).

Remark 2.3. Properties (V1) − (V8) are also called axioms of a vector space.Using only these axioms, one can easily show that the zero vector in Axiom (V3)is unique, as well as the vector (−v) in Axiom (V4) is unique for each v ∈ V .

Remark 2.4. We shall equivalently say that V is closed under vector additionand multiplication by scalars.

Remark 2.5. If K = R, V is called a real vector space and if K = C, V iscalled a complex vector space. Henceforth, we shall restrict our attention tovector spaces defined over the fields of real and complex numbers.

Example 2.2 (Examples of vector spaces). Let us consider some remarkable ex-amples of vector spaces.

1. V = {0V }.

2. Let K be a field. Then K is a vector space over itself. Addition “+” is addi-tion between scalars a+ b and multiplication by scalars “·” is multiplicationbetween scalars ab, with a, b ∈ K.

3. The spaces Kn, for n ≥ 1, of n-tuples of real (or complex) numbers arepremier examples of vector spaces. Let

v =

x1...xn

and w =

y1...yn

(2.1)

be elements of Kn. The scalars x1, . . . , xn are called the components, orcoordinates, of v. Then addition and multiplication are defined coordinate-wise, i.e., for v,w ∈ Kn and a ∈ K we have that

v + w =

x1 + y1...

xn + yn

and av =

ax1...axn

. (2.2)

The zero element is the n-tuple with with all its coordinates equal to 0. Onecan easily verify that all the properties (V1)-(V8) are satisfied. The mostfamiliar examples are R2 and R3, which we can think of geometrically as thepoints of the ordinary two- and three-dimensional space, equipped with acoordinate system.

11

4. An example of vector space that arises in many applications is given by thevector space of functions. Let X(D) be the set of all real-valued functionsdefined on a set D ⊆ R in the variable t

X(D) = {f : D→ R, t 7→ f(t)}. (2.3)

Typically, D is the set of real numbers R or some interval [a, b] on the realline. Given f ,g ∈ X(D), their addition f + g ∈ X(D) is defined by

(f + g)(t) := f(t) + g(t). (2.4)

The scalar multiple (af), a ∈ R is the function defined by

(af)(t) := af(t). (2.5)

For instance, if D = R, f(t) = 2 + sin 3t, and g(t) = −1 + 1.5t, then

(f + g)(t) = 1 + sin 3t+ 1.5t and (2g)(t) = −2 + 3t. (2.6)

We shall say that two functions in X(D) are equal if and only if their valuesare equal for every t in D. The zero function is the function that is identicallyzero, that is 0(t) = 0 for all t ∈ D. Clearly, it acts as the zero vector inAxiom (V3). Finally, (−1)f acts as the opposite of f . Axioms (V3) and (V4)are obviously true, and the other axioms follow from properties of the realnumbers, so X(D) is a vector space.

Other examples of vector spaces of functions are given by

• C([a, b]) the set of all continuous functions on [a, b];

• Cp([a, b]) the set of all functions continuous up to their p-th derivativeon [a, b];

• C∞(R) the set of all infinitely continuous on functions on R.

5. Special attention is paid to the vector space Pn, n ≥ 0 of all polynomials oforder n of the form

pn(t) =n∑k=0

aktk, (2.7)

where the coefficients a0, . . . , an and the variable t are real numbers. Thedegree n of p is the highest power of t in (2.7), whose coefficient is notzero. The degree of a constant polynomial p(t) = a0 6= 0 is zero. The

12

zero polynomial has all the coefficients equal to zero and it is included inPn even though its degree, for technical reasons, is not defined. The basicoperations of addition and multiplication by scalar are defined pointwise asfor functions.

Definition 2.3 (Vector subspace). Let V be a vector space over a field K. Anon-empty subset W ⊆ V is called a vector subspace if and only if is a vectorspace over K itself. In particular, only three of the vector space axioms need tobe checked; the rest are automatically satisfied

(W1) ∀w1,w1 ∈ W =⇒ w1 + w2 ∈ W ;

(W2) ∀a ∈ K,∀w ∈ W =⇒ aw ∈ W ;

(W3) 0V ∈ W .

Example 2.3 (Examples of vector subspaces). Let us recall some remarkableexamples of vector subspaces.

1. The set consisting of only the zero vector in a vector space V is a subspaceof V , called the zero subspace and written as {0V }.

2. Cp([a, b]) ⊂ C([a, b]).

3. Pn ⊂ C∞(R).

4. The vector space R2 is not a subspace of R3 because R2 is not even a subsetof R3 (vectors in R3 all have three components, whereas the vectors in R2

have only two). We can define H a subset of R3 which “looks” and “acts”like R2

H :=

ab

0

∈ R3 : a, b ∈ R

. (2.8)

It can be verified that H is a vector subspace of R3. We refer to Exercise 2.2for an exhaustive description of all subspaces of R3.

5. Let V be a vector space and let U ⊆ V , and W ⊆ V be subspaces. Wedenote by U ∩ W the intersection of U and W , i.e. the set of elementswhich lie both in U and W

U ∩W := {v ∈ V : v ∈ U and v ∈ W}. (2.9)

It can be verified that U ∩W is a subspace. For instance, if U,W are twoplanes in R3 passing through the origin, then in general, their intersectionwill be a straight line passing through the origin.

13

6. Let V be a vector space and let U ⊆ V , and W ⊆ V be subspaces. LetU +W denote the set of all elements such that

U +W := {v ∈ V : v = u + w,u ∈ U,w ∈ W}. (2.10)

It can be verified that U + W is a subspace of V , said to be generated byU and W , and called the sum of U and W . If the expression as sum of anelement of U and an element of W for any v in (2.10) is unique, than (2.10)is called direct sum and it is denoted by U ⊕W .

2.2 System of generators, linear dependence and

independence of vectors.

Definition 2.4 (Span). Let V be an arbitrary vector space and {v1, . . . ,vn} aset n vectors in V . We call the subspace generated (or spanned) by{v1, . . . ,vn}the set of all finite linear combinations of {v1, . . . ,vn}

span{v1, . . . ,vn} := {a1v1 + · · ·+ anvn, a1, . . . , an ∈ K}. (2.11)

It can be easily verified that (2.11) defines a vector subspace of V .

Definition 2.5 (System of generators). We say that the system {v1, . . . ,vn} gen-erates V if ∀v ∈ V , there exist coefficients a1, . . . , an ∈ K such that

v = a1v1 + · · ·+ anvn. (2.12)

In other words, V = span{v1, . . . ,vn} and {v1, . . . ,vn} is called a system ofgenerators for V .

Definition 2.6 (Linearly dependence/independence). Let V be an arbitrary vec-tor space and {v1, . . . ,vn} a set of n vectors in V . We say that {v1, . . . ,vn} islinearly independent if the relation

V 3 v = a1v1 + · · ·+ anvn = 0V (2.13)

implies

a1 = · · · = an = 0. (2.14)

Otherwise, if there exist scalars a1, . . . , an ∈ K, not all equal to 0, such that(2.13) holds true, we say that {v1, . . . ,vn} is linearly dependent.

Example 2.4. Trivially, any set containing the vector 0V is linearly dependent.

Example 2.5. A set containing a single vector v is linearly independent if andonly if v 6= 0V .

Example 2.6. Trivially, if a set of vectors is linearly dependent, then one of themcan be written as a linear combination of the others.

14

2.3 Bases and dimension of a vector space.

Definition 2.7 (Basis). We call a basis of V any set of linearly independentvectors {v1, . . . ,vn} that generates V .

In particular, since {v1, . . . ,vn} is a system of generators for V , it follows that∀v ∈ V can be written as

v = a1v1 + · · ·+ anvn, a1, . . . , an ∈ K. (2.15)

Then, the scalars (a1, . . . , an) are called the coefficients (or components orcoordinates) of v with respect to the basis {v1, . . . ,vn} and decomposition (2.15)is called the decomposition of v with respect to the basis {v1, . . . ,vn}. Moreover,it can be proved that from {v1, . . . ,vn} being also linearly independent, then thecoefficients in decomposition (2.15) are uniquely defined, that is, if there existcoefficients b1, . . . , bn ∈ K such that

v = b1v1 + · · ·+ bnvn, (2.16)

then necessarily

a1 = b1, . . . , an = bn. (2.17)

This result is known in many textbooks as the “Unique Representation Theorem”.

Example 2.7. Trivially, the vector 0V is never part of a basis.

The next theorem gives us useful information about the number of elementsof any basis of a vector space. The main result is that any two bases of a vectorspace have the same number of elements. Moreover, it gives us a useful criterionto determine when a set of elements of a vector space is a basis.

Theorem 2.1. Let V be a vector space over the field K. Let {v1, . . . ,vn} be abasis of V over K. Then, the following statements hold true

1. any set of linearly independent vectors in V has at most n elements;

2. any other basis has exactly n elements;

3. any set of n linearly independent vectors in V must generate V , and hence,form a basis.

Remark 2.6. Theorem 2.1 suggests that there exists a number that is uniquelyassociated to any vector space V , which is the number of elements of any basis ofV . This number is an intrinsic property of the space V that does not depend onthe particular choice of the basis and that will define its dimension.

15

Definition 2.8 (Dimension of a vector space). Let V be a vector space having abasis consisting of n elements. We call n the dimension of V and we write

dim(V) = n. (2.18)

If V consists of 0V alone, then V does not have a basis, and we shall say that Vhas dimension 0. A vector space which has a basis consisting of a finite numberof elements, or the zero vector space, is called finite dimensional. Otherwise, iffor any n, there exist n linearly independent vectors of V , then V is called infinitedimensional and we write

dim(V) =∞. (2.19)

Theorem 2.2 (Dimension of a subspace). Let V be a vector space having a basisof n elements. Let W ⊆ V be a vector subspace. Then

dim(V) ≤ dim(V). (2.20)

Example 2.8 (Examples of dimensions of vector spaces). Let us recall the dimen-sions of some remarkable vector spaces.

1. Let K be a field, which is a vector space over itself. Then and its dimensionis 1. Indeed, the element 1K forms a basis of K over K, since any elementa ∈ K has a unique expression as a = a · 1.

2. ∀ integer p, dim(Cp([a, b])) =∞;

3. dim(Rn) = dim(Cn) = n.

Remark 2.7 (Standard basis for Rn). The most commonly used basis for Rn isthe canonical basis, or the standard basis, which is given by the set

{e1 = (1, 0, . . . , 0)T , e2 = (0, 1, . . . , 0)T , . . . , en = (0, 0, . . . , 1)T}. (2.21)

In a compact form, it can defined in terms of the Kronecker delta

(ei)j = δij =

{1 if i = j

0 if i 6= j(2.22)

where (ei)j denotes the j-th component of ei, ∀ i, j = 1, . . . , n.

Remark 2.8. Generally, vectors are always intended as columns. For typograph-ical reasons (it takes up less space), many texts write them as row vectors and usethe symbol for the transpose to denote that they are column vectors.

16

2.4 Exercises

Exercise 2.1. Show that the vectors (1, 1)T and (−1, 2)T form a basis of R2.

Solution 2.1. Since dim(R2) = 2, it is sufficient to prove that the two givenvectors are linearly independent. Indeed, Theorem 2.1 implies that any other setof two linear independent vectors in R2 forms a basis. Consider a generic relationof type (2.13)

a1

(1

1

)+ a2

(−1

2

)= 0R2 =

(0

0

), (2.23)

for a1, a2 ∈ R. Writing this equation in terms of components, we find that{a1 − a2 = 0

a1 + 2a2 = 0(2.24)

This is a system of two equations which we solve for a1 and a2. Subtracting thesecond from the first, we get −3a2 = 0, whence a2 = 0. Substituting in eitherequation, we find a1 = 0. Hence a1, a2 are both 0, hence our vectors are linearlyindependent.

Exercise 2.2. Check if v1 = (1, 2)T ,v2 = (5, 7)T and v3 = (10, 5)T are linearlyindependent or not.

Solution 2.2. Since dim(R2) = 2, Theorem 2.1 implies that any set of linearlyindependent vectors in R2 has at most two elements, that is we can find at most twolinearly independent vectors in R2. Here we are given three vectors {v1,v2,v3},which can not be linearly independent in R2.

Exercise 2.3. Find all possible vector subspaces of R3 with respective dimensions.

Solution 2.3. We first observe that by Definition 2.3 any subspace W ⊆ R3 mustcontain the vector 0R3 , that is the origin, and it has to satisfy the linearity condi-tions (W1) and (W2). In addition, Theorem 2.2 implies that there the dimensionof any subspace has to be smaller or equal to three.

• 0-dimensional subspaces. Only the zero subspace {0V }.

• 1-dimensional subspaces. Any subspace spanned by a single nonzero vector

span{v},v 6= 0. (2.25)

Such subspaces are lines through the origin.

17

• 2-dimensional subspaces. Any subspace spanned by two linearly independentvectors v1 and v2

span{v1,v2}. (2.26)

Such subspaces are planes through the origin.

• 3-dimensional subspaces. Only R3 itself. In particular, any three linearlyindependent vectors in R3 span all of R3 (cf. Theorem 2.1).

Remark 2.9. It is helpful to keep the geometric pictures of subspaces in Exer-cise 2.3 in mind, even for an abstract vector space.

18

Chapter 3

Matrices

3.1 The space of matrices

Definition 3.1 (Matrices). Let K be a given field and let m,n be two positiveintegers. A matrix A over K having m rows and n columns, or an m × n (m-by-n) matrix is an array of mn scalars aij ∈ K, i = 1, . . . ,m and j = 1, . . . , n,represented as

A =

a11 a12 . . . a1n

a21 a22 . . . a2n...

......

...am1 am2 . . . amn

(3.1)

The index i denotes the rows and the index j denotes the columns. For instance,the i−th row is

[ai1, ai1, . . . , ain

]and the j−th column is

a1j

a2j

. . .amj

(3.2)

We abbreviate the notation (3.1) by A = (aij), 1 = 1, . . . ,m and j = 1, . . . n andwe write A ∈ Km×n.

Definition 3.2 (Basic terminology). The integers m and n are called the dimen-sions of the matrix A and the couple (m,n) defines its size (or shape). Thescalars aij are called its entries or components. The diagonal entries area11, a22, a33, . . . and they form the main diagonal of A. If m = n, then the ma-trix A is called a square matrix. A diagonal matrix is a square matrix whose

19

nondiagonal entries are zero, i.e. aij = 0 for i 6= j. An example is the n × nidentity matrix (or the identity matrix of order n)

In =

1 0 . . . 00 1 . . . 0...

......

...0 0 . . . 1

∈ Rn×n. (3.3)

An m× n matrix whose entries are all zero is a zero matrix and it is writtenas O.

O =

0 0 . . . 00 0 . . . 0...

......

...0 0 . . . 0

∈ Km×n. (3.4)

The size of a zero matrix is usually clear from the context.We say that two matrices A = (aij) and B = (bij) are equal if they have

the same size (i.e., the same number of rows and the same number of columns)and their corresponding entries are equal, i.e. aij = bij, ∀i = 1, 2, . . . ,m andj = 1, 2, . . . , n.

Example 3.1 (Examples of matrices). Let us consider some concrete examples.

1. The following is a 3× 2 matrix with real entries:

A =

5 27 −70 8

∈ R3×2. (3.5)

A is an array of 3 · 2 = 6 real numbers. The rows are [5, 2], [7,−7] and [0, 8]and the columns are 5

70

, 2−78

. (3.6)

2. An example of a matrix over the field of complex numbers is:

B =

[i 0 7 2i

2 + 1 4 3 2− i

]∈ C2×4. (3.7)

3. A row vector [x1, x2, . . . , xn] ∈ R1×n is a particular 1 × n matrix, while acolumn vector [y1, y2, . . . , ym]T ∈ Rm×1 is a particular m×1 matrix. A singlescalar [a] can be viewed as a 1× 1 matrix.

20

3.2 Matrix operations

We shall now define the operations of matrix addition and matrix multiplicationby scalars. The basic arithmetic for matrices is a natural extension of the basicarithmetic for scalars.

Definition 3.3 (Matrix addition). Given A = (aij) and B = (bij) two m × nmatrices over K, we define the sum of A and B to be the m × n matrix A + Bobtained by adding corresponding entries. That is,

(A+B)ij = aij + bij ∀i = 1, . . . ,m, and ∀j = 1, . . . , n. (3.8)

For example,[2 1 30 −7 0

]+

[1 0 31 −1 −2

]=

[2 + 1 1 + 0 3 + 30 + 1 −7− 1 0− 2

]=

[3 1 61 −8 −2

]. (3.9)

Note that addition is only defined for matrices having the same shape.

Remark 3.1. The symbol “+” is used to denote addition between scalars in someplaces and addition between matrices at other places. Although these are twodistinct algebraic operations, no ambiguities will arise from the context.

Remark 3.2. The matrix (−A), called the additive inverse of A, is defined tobe the matrix obtained by negating each entry of A. That is, if A = (aij), then−A = (−aij). This allows matrix subtraction to be defined in the natural way. Fortwo matrices of the same shape, the difference A− B is defined to be the matrix(A−B)ij = aij − bij, ∀i = 1, . . . ,m, and ∀j = 1, . . . , n.

Since matrix addition is defined in terms of scalar addition, the familiar alge-braic properties of scalar addition are inherited by matrix addition. Indeed, form× n matrices A,B, and C, the following properties hold.

− Associative property: (A+B) + C = A+ (B + C);

− Commutative property: A+B = B + A;

− The m× n zero matrix O has the property that A+O = A.

− The m× n matrix (−A) has the property that A+ (−A) = O.

Definition 3.4 (Matrix multiplication by scalars). Given A = (aij) an m × nmatrix over K and a scalar c ∈ K, we define the multiplication of A by c to bethe m× n matrix cA obtained by multiplying each entry of A by c. That is,

(cA)ij = caij ∀i = 1, . . . ,m, and ∀j = 1, . . . , n. (3.10)

21

For example,

5

[−i 30 3

]=

[−5i 15

0 15

]. (3.11)

The rules for combining addition and multiplication by scalars between matri-ces are listed below. Indeed, for m× n matrices A and B and for scalars c and d,the following properties hold.

− Associative property: (cd)A = c(dA);

− Distributive property: c(A + B) = cA + cB (multiplication by scalar isdistributed over matrix addition);

− Distributive property: (c + d)A = cA + dA (multiplication by scalar is dis-tributed over scalar addition);

− 1KA = A.

Remark 3.3 (Vector space over K). LetMat(m,n;K) denote the set of all m×nmatrices over a field K. The properties of matrix addition and multiplication byscalars show that it forms a vector space over K. We shall equivalently denote itby Km×n or by Mm×n(K). Its dimension is mn.

Remark 3.4 (Canonical basis). ForMat(m,n;R) the canonical basis is given byall the m × n matrices Eij = (eij) i,= 1, . . .m and j = 1, . . . , n whose entries areall equal to zero, a part from the entry eij, which is equal to one.

Example 3.2. The space of all 2×2 real matrices has dimension four. Its canonicalbasis has four elements

E1 =

[1 00 0

], E2 =

[0 10 0

], E3 =

[0 01 0

], and E4 =

[0 00 1

].

Every 2× 2 matrix can be decomposed with respect to this basis. For example,

A =

[3 47 0

]= 3

[1 00 0

]+ 4

[0 10 0

]+ 7

[0 01 0

]+ 0

[0 00 1

]. (3.12)

3.3 Matrix multiplication

The following sections are devoted to the introduction and discussion of other op-erations between matrices that don’t derive from basic operations between scalars.Such operations are matrix multiplication, transposition and inversion.

22

Definition 3.5 (Matrix multiplication). We first introduce a new notion.

• Two matrices A and B are said to be compatible (or conformable) formultiplication AB whenever the number of columns of A is the same as thenumber of rows of B, i.e. A is m× n and B is n× p.

• For compatible matrices A = (aij) ∈ Km×n and B = (bij) ∈ Kn×p, thematrix product AB is defined to be the m× p matrix whose (i, j)-entry isthe inner product of the ith row of A with the jth column in B. That is, fori = 1, . . . ,m, and k = 1, . . . , p

(AB)ik = ai1b1k + ai2b2k + · · ·+ ainbnk =n∑j=1

aijbjk. (3.13)

• In case A and B fail to be conformable, then no product AB is defined.

Example 3.3. If

A =

[2 10 3

]and B =

[1 −14 0

], (3.14)

we can compute AB and it is going to be a 2× 2 matrix

AB =

[2 · 1 + 1 + 4 2 · (−1) + 1 · 00 · 1 + 3 · 3 0 · (−1) + 3 · 0

]=

[7 −29 0

]. (3.15)

Example 3.4. Consider the matrices

A =

7 21 10 5−3 4

, B =

1 0 00 1 00 0 1

and C =

[3 4 10 0 4

]. (3.16)

Matrices compatible for multiplication are A and C, and C and B. All the othercombinations –i.e., A and B, B and A, and B and C– don’t lead to compatibilityfor multiplication. We can compute

AC =

7 · 3 + 2 · 0 7 · 4 + 2 · 0 7 · 1 + 2 · 41 · 3 + 1 · 0 1 · 4 + 1 · 0 1 · 1 + 1 · 40 · 3 + 5 · 0 0 · 4 + 5 · 0 0 · 1 + 5 · 4−3 · 3 + 4 · 0 −3 · 4 + 4 · 0 −3 · 1 + 4 · 4

=

21 28 153 4 50 0 20−9 −12 13

(3.17)

and

CB =

[3 · 1 + 4 · 0 + 1 · 0 3 · 0 + 4 · 1 + 1 · 0 3 · 0 + 4 · 0 + 1 · 10 · 1 + 0 · 0 + 4 · 0 0 · 0 + 0 · 1 + 4 · 0 0 · 0 + 0 · 0 + 4 · 1

]=

[3 4 10 0 4

].

(3.18)

23

Remark 3.5. Example 3.4 shows that even if the product AB exists, the productBA might not be defined. If particular, if A is m× n, then B needs to be n×min order for both products AB and BA to exist.

Remark 3.6 (Matrix multiplication is not commutative.). Moreover, even whenboth products AB and BA exist, they need not be equal. In general, matrixmultiplication is a noncommutative operation, i.e.

AB 6= BA. (3.19)

For example, if

A =

[1 01 −1

]and B =

[1 22 4

], (3.20)

then

AB =

[1 2−1 −2

]but BA =

[3 −26 −4

]6= AB. (3.21)

Example 3.5. For scalars, ab = 0 implies a = 0 or b = 0. The following exampleshows that the analogous statement for matrices does not hold—the matrices inthis example show that it is possible for AB = O with A 6= O and B 6= O. Let

A =

[1 −11 −1

]and B =

[1 11 1

], (3.22)

then

AB =

[0 00 0

](3.23)

in spite of A 6= O and B 6= O.

Example 3.6. Recall the cancellation law for scalars, which states that ab = acand a 6= 0 implies b = c. A similar row does not hold for matrices. In fact, if

A =

[1 11 1

], B =

[2 22 2

]and C =

[3 11 3

], (3.24)

then

AB =

[4 44 4

]= AC but B 6= C (3.25)

in spite of the fact that A 6= O.

24

Although that there are some differences between scalar and matrix algebra–most notable is the fact that matrix multiplication is not commutative, and thereis no cancellation law– fortunately, other important properties hold for matrixmultiplication. For compatible matrices, the following properties hold true

− Left-hand distributive property: A(B + C) = AB +BC;

− Right-hand distributive property: (D + E)F = DF + EF ;

− Associative property: A(BC) = (AB)C.

Remark 3.7 (Identity matrix). We have already defined in Equation (3.3) theidentity matrix of order n with ones on the main diagonal and zeros elsewhere. Itplays the “role” of the identity element for matrix multiplication because it hasthe property that it reproduces whatever it is multiplied by. Indeed, for everym× n matrix A,

AIn = A and ImA = A. (3.26)

Definition 3.6 (Powers of matrices). Given a square n × n matrix A, we definethe 0th power of A to be the identity matrix of order n, that is

A0 = In. (3.27)

For a nonnegative integer s, we define

As = AA · · ·A︸︷︷︸s times

. (3.28)

The associative law guarantees that it makes no difference how matrices aregrouped for powering, and due to the lack of compatilibity, powers of nonsquarematrices are never defined. Also, the usual laws of exponents hold. For nonnegativeintegers r and s,

ArAs = Ar+s and (Ar)s = Ars. (3.29)

3.4 Matrix inversion

Definition 3.7 (Matrix inverse). Given a square matrix A of order n, we call theinverse of A and denote it by A−1, the square matrix of order n that satisfies theconditions

AA−1 = In and A−1A = In. (3.30)

Notice that condition (3.30) rules out inverses of nonsquare matrices.

25

Remark 3.8. Not all square matrices are invertible, i.e., given a square matrixA of order n, we can not always find a square matrix A−1 of order n satisfyingconditions (3.30). A trivial example of is the zero matrix or order n, but there arealso many nonzero matrices that are not invertible.

Remark 3.9. Although not all matrices are invertible, when an inverse exists, itis unique.

Definition 3.8. An invertible matrix is said to be nonsingular, and a squarematrix with no inverse is called a singular matrix.

For nonsingular matrices A and B, the following properties hold.

− (A−1)−1 = A;

− The product AB is also nonsingular;

− (AB)−1 = B−1A−1 (the reverse order property for inversion).

More generically, if A1, A2, . . . , Ak are each n × n nonsingular matrices, thenthe product A1A2 . . . Ak is also nonsingular, and its inverse is given by the reverseorder property

(A1A2 . . . Ak)−1 = A−1

k . . . A−12 A−1

1 . (3.31)

Example 3.7 (Inverse of a 2× 2 matrix). Given a 2× 2 matrix

A =

[a bc d

], where δ := ad− bc 6= 0, (3.32)

then

A−1 =1

δ

[d −b−c a

](3.33)

because it can be easily verified that A−1 satisfies condition (3.30). We shall seein Section 3.6 that the number δ defines the determinant of A. More generically,a matrix will be invertible if and only if its determinant is different from zero.

3.5 Matrix transposition

Another matrix operation that is not derived from scalar arithmetic is matrixtransposition.

26

Definition 3.9 (Matrix transpose). Given an m×n matrix A over R, the trans-pose of A is defined to be the n ×m matrix AT obtained by interchanging rowsand columns in A. More precisely, if A = (aij), then

(AT )ij = aji, ∀i = 1, . . . ,m, and j = 1, . . . , n. (3.34)

For example, 7 21 00 5−3 4

T

=

[7 1 0 −32 0 5 4

]. (3.35)

Trivially, for all matrices, (AT )T = A.

Definition 3.10 (Matrix adjoint). Given an m× n matrix A over C, the conju-gate transpose (or adjoint) of A is defined to be the n×m matrix AH (sometimesalso denoted by A?) obtained by interchanging rows and columns in A accompaniedby the transpose operation of the complex entries of A. Recall that the complexconjugate of the complex number z = a+ ib is defined to be the complex numberz = a− ib. In other words, for A = (aij), then

(AH)ij = aji, ∀i = 1, . . . ,m, and j = 1, . . . , n. (3.36)

For example,2− 3i 2−i 10 5−3 −8− 4i

H

=

[2 + 3i i 0 −3

2 1 5 −8 + 4i

]. (3.37)

The transpose (and conjugate transpose) operation is easily combined withmatrix addition and scalar multiplication. Indeed, given A and B two matricesof the same shape over K (K = R or K = C) and c a scalar, then the followingproperties hold true.

− (A+B)T = AT +BT and (A+B)H = AH +BH ;

− c(A)T = AT +BT and (A+B)H = AH +BH .

Definition 3.11 (Symmetries). Given a square matrix A = (aij) over R.

• A is said to be a symmetric matrix whenever A = AT , i.e., wheneveraij = aji.

27

• A is said to be a skew-symmetric matrix whenever A = −AT , i.e., when-ever aij = −aji.

Given a square matrix A = (aij) over C.

• A is said to be a hermitian matrix whenever A = AH , i.e., wheneveraij = aji.

• A is said to be a skew-hermitian matrix whenever A = −AT , i.e., wheneveraij = −aji.

3.6 Determinants

We start by a gentle introduction on the topic by first carrying out separatelythe computation of determinants of order 2 and 3. After that, we introduce thegeneralised theory for determinants of order n.

Definition 3.12 (Determinants of order 2). Given a 2 × 2 matrix over a field K(R or C for us)

A =

[a11 a12

a21 a22

], (3.38)

the determinant of A is defined to be the number

det(A) := a11a22 − a12a21. (3.39)

For example, if

A =

[1 0−1 3

], (3.40)

thendet(A) = 1 · 3− 0 · (−1) = 3. (3.41)

Remark 3.10. We shall equivalently use the notation∣∣A∣∣ or

∣∣∣∣x yz v

∣∣∣∣ for det(A).

Exercise 3.1. Show that if the columns (or rows) of A are interchanged, thedeterminant (3.39) changes by a sign.

Solution 3.1. By direct application of (3.39), we find that if we interchange thecolumns∣∣∣∣a12 a11

a22 a21

∣∣∣∣ = a12a21 − a11a22 = −(a11a22 − a21a12) = −det(A). (3.42)

Equivalently, when rows are interchanged∣∣∣∣a21 a22

a11 a12

∣∣∣∣ = a21a12 − a11a22 = −(a11a22 − a12a21) = −det(A). (3.43)

28

Exercise 3.2. Show that the columns of A are linearly dependent if and only if thedeterminant (3.39) is zero (equivalently, the columns of A are linearly independentif and only if det(A) 6= 0).

Solution 3.2. By definition, the columns of A are linearly dependent if there existscalars (c, d) 6= (0, 0), such that

c

(a11

a21

)+ d

(a12

a22

)=

(ca11 + da12

ca21 + da22

)=

(0

0

). (3.44)

Writing this equation in terms of components and assuming c 6= 0, we find a systemof two equations which we solve for a11 and a21.{

a11 = −(d/c)a12

a21 = −(d/c)a22

(3.45)

By substituting the entries of A, we compute

det(A) =

∣∣∣∣−(d/c)a12 a12

−(d/c)a22 a22

∣∣∣∣ = −(d/c)a12a22 − a12(−(d/c)a22) = 0. (3.46)

Exercise 3.3. Show that if two columns (or rows) of A are equal, the determi-nant (3.39) is zero.

Solution 3.3. By direct application of (3.39), we find∣∣∣∣a ab b

∣∣∣∣ = ab− ab = 0 and

∣∣∣∣a ba b

∣∣∣∣ = ab− ab = 0.

Exercise 3.4. Show that the determinant of the identity matrix of order 2 is 1.

Solution 3.4. Trivially,

det(I2) =

∣∣∣∣1 00 1

∣∣∣∣ = 1− 1 = 0.

Remark 3.11. Exercises 3.1, 3.2, 3.3 and 3.4 can be generalised to any dimensionn ≥ 1 and hold true also for higher dimensions. They represent some of the mainproperties of determinants.

Definition 3.13 (Determinants of order 3). Given a 3× 3 matrix over K

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

, (3.47)

29

its determinant can be computed using the expansion by the first row rule

det(A) = a11det(A11)− a12det(A12) + a13det(A13)

= a11

∣∣∣∣a22 a23

α32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

α31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22

α31 a32

∣∣∣∣= a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31.

(3.48)In (3.48) Aij represents the matrix obtained by eliminating from A the ith row andthe jth column.

Remark 3.12. We can expand by any row and column following the pattern ofthe signs + − +

− + −+ − +

(3.49)

and the result (3.48) won’t change.

Remark 3.13. In general, when an entry is equal to zero it is convenient to expandaccording to the corresponding row or a column. For example, by expanding bythe second column in the following example, we find∣∣∣∣∣∣

3 0 11 2 5−1 4 2

∣∣∣∣∣∣ = −0

∣∣∣∣ 1 5−1 2

∣∣∣∣+ 2

∣∣∣∣ 3 1−1 2

∣∣∣∣− 4

∣∣∣∣3 11 5

∣∣∣∣= 14− 4 · 14

= −42.

(3.50)

The expansion by a row or a column rules are particular cases of the genericLaplace rule for the expansion of the determinant, which allows to reducethe computation of an n×n determinant to that of n (n−1)×(n−1) determinants.Before introducing that, let us give the general definition of determinants of ordern.

Recall that a permutation π = (π1, . . . , πn) of the numbers (1, 2, . . . , n) issimply any rearrangement. In general, there exist has n! = n(n − 1)(n − 2) . . . 1different permutations of the sequence (1, 2, . . . , n).

The sign of a permutation π is defined to be the number

sgn(π) =

{+1 if an even number of exchanges is needed to restore π to natural order

−1 if an odd number of exchanges is needed to restore π to natural order

30

For example, if p = (1, 4, 3, 2) is a permutation of (1, 2, 3, 4), then sgn(p) = −1, andif p = (4, 3, 2, 1), then sgn(p) = +1. The sign of the natural order p = (1, 2, 3, 4)is naturally sgn(p) = +1. The general definition of the determinant can now begiven.

Definition 3.14 (Determinants of order n). Let A = (aij) be a square matrix oforder n over K

A =

a11 a12 . . . a1n

a21 a22 . . . a2n...

......

...am1 am2 . . . amn

, (3.51)

the determinant of A is defined to be the scalar

det(A) =∑π∈P

sgn(π)a1π1a2π2 . . . anπn ∈ K, (3.52)

where P = {π = (π1, . . . , πn)} is the set of n! permutations of (1, . . . , n). Observethat each term a1π1a2π2 . . . anπn in (3.52) contains exactly one entry from eachrow and each column of A. Note that the determinant is only defined for squarematrices.

Exercise 3.5. Compute the determinant of a 2 × 2 matrix A =

[a11 a12

a21 a22

]by

applying the definition.

Solution 3.5. There are 2! = 2 permutations of the sequence (1, 2), and preciselyP = {(1, 2), (2, 1)}, so det(A) contains the two terms

sgn(1, 2)a11a22 and sgn(2, 1)a12a21. (3.53)

In particular, sgn(1, 2) = +1 and sgn(2, 1) = −1. By substituting, we obtain thealready familiar formula (3.39)

det(A) = a11a22 − a12a21. (3.54)

Definition 3.15 (Laplace’s expansion formula for the determinant). Laplace’sexpansion rule for the computation of an n× n determinant according the ith rowis

det(A) =

{a11 if n = 1∑n

j=1(−1)i+jdet(Aij)aij if n ≥ 2(3.55)

31

and according to jth column

det(A) =

{a11 if n = 1∑n

i=1(−1)i+jdet(Aij)aij if n ≥ 2(3.56)

The scalar det(Aij) is called the complementary minor associated with aij andthe term Cij := (−1)i+jdet(Aij) is called the cofactor of the entry aij.

Let us enumerate the main properties that are satisfied by determinants of anyorder:

− det(AT ) = det(A), ∀n;

− det(In) = 1, ∀n;

− If two columns or rows are interchanged, the determinant changes by a sign;

− If the matrix B is obtained by multiplying row i of A by the scalar c, thendet(B) = cdet(A);

− If the matrix B is obtained by adding c times row i of A to row j of A, thendet(B) = det(A);

− An n×n matrix A is nonsingular if and only if det(A) 6= 0 (or, equivalently,A n× n is singular if and only if det(A) = 0).

− For any n× n matrix A and scalar c, det(cA) = cndet(A);

− For compatible matrices A and B, det(AB) = det(A)det(B) (Binet’s for-mula).

32

Chapter 4

Linear mappings

4.1 The space of linear mappings

Definition 4.1. Given two vector spaces V and W over K, a mapping L : V → Wis linear if it preserves the operations of vector addition and multiplication byscalar, that is

i) L(v1 + v2) = L(v1) + L(v2) for all v1,v2 ∈ V ;

ii) L(cv) = cL(v) for all c ∈ K and for all v ∈ V ;

or, equivalently, both conditions can be combined into one requirement

iii) L(cv1 + dv2) = cL(v1) + dL(v2) for all v1,v2 ∈ V and for all c, d ∈ K.

In particular, condition ii) implies that

L(0V ) = 0W , (4.1)

since L(0V ) = L(0Kv) = 0KL(v) = 0W for all v ∈ V .Repeated application of condition iii) produces the generalization

L(c1v1 + c2v2 + · · ·+ ckvk) = c1L(v1) + c2L(v2) + · · ·+ ckL(vk) (4.2)

for all v1,v2, . . .vk ∈ V and for all c1, c2, . . . , ck ∈ K.

Suppose that we are given {v1,v2, . . . ,vn} a basis for V and we want to knowthe value of the linear mapping for a generic vector v ∈ V . The crucial consequenceof linearity is that if we know the value of the mapping for each vector in thebasis, then we know its value for each vector in the entire space. Indeed, since{v1,v2, . . . ,vn} is a basis, then each v in V can be uniquely written as

v = c1v1 + c2v2 + · · ·+ cnvn, (4.3)

33

for unique coefficients c1, c2, . . . , cn ∈ K. The generalized linearity condition (4.2)implies that

L(v) = L(c1v1 + c2v2 + · · ·+ cnvn)

= c1L(v1) + c2L(v2) + · · ·+ cnL(vn).(4.4)

The only freedom for the mapping is its value on the basis vectors. Once we settlethe basis vectors, the mapping of every vector of the space is settled.

Remark 4.1. The addition symbol “+” in conditions i) and iii) in some placesdenotes addition between vectors in V “+V ” (e.g, in L(v1 +V v2)) and additionbetween vectors in W “+W” in others (e.g., in L(v1) +W L(v2)). We shall writesimply “+” and we shall assume the space is clear from the context.

Definition 4.2 (Basic terminology). Linear mappings are often called linear trans-formations or linear applications. A mapping will also be called a map, for thesake of brevity. A linear map from V into itself is called a linear operator on V .Like for classic terminology for functions, the vector space V is called the domainof T and W is called the codomain of T . For v ∈ V, the vector T (v) is called theimage of v under the action of T .

Definition 4.3 (The space of linear maps). Let V and W be two vector spacesover K. We denote the space of all linear maps by

L(V,W ) := {L : V → W linear mapping}. (4.5)

We shall define the operations of addition multiplication by scalars in L(V,W ) insuch a way as to make it into a vector space.

Given L, F ∈ L(V,W ), their addition L+F is the map L+F : V → W , suchthat

(L+ F )(v) = L(v) + F (v) for each v ∈ V. (4.6)

It can be easily verified that the map L+ F satisfies the two conditions i) and ii)which define a linear map.

Furthermore, if a ∈ K and L ∈ L(V,W ), we define the map (aL) : V → W ,such that

(aL)(v) = aL(v) for each v ∈ V. (4.7)

Again, it can be easily verified that aL is a linear map.Given L ∈ L(V,W ), we can define the opposite map −T to be the map (−1)T .

Finally, we recall that we have already defined the zero map. We can easily verifythat the properties (V1) through (V8) are satisfied and conclude that the set oflinear maps between two vector spaces is itself a vector space. Its dimension is

dim(L(V,W )) = dim(V ) · dim(W ). (4.8)

34

4.2 Image and Kernel of linear mappings

Definition 4.4 (Kernel of a linear map). Let V and W be vector spaces over K,and let F : V → W be a linear map. We define the kernel of F to be the set ofall elements in V, whose image in W is the zero vector

Ker(F ) := {v ∈ V : F (v) = 0W} ⊆ V. (4.9)

It is always a non-empty set, since 0V is always an element. It can be proved thatthe kernel of a linear map F is a vector subspace of V .

The kernel of a linear map is useful to determine when the map is injective.

Definition 4.5 (Injective map). A linear mapping F : V → W is injective ifdifferent vectors in V are mapped into different vectors in W, that is, if v1,v2 areelements of V such that F (v) = F (w), then v = w.

Theorem 4.1. Let F : V → W be a linear map. The following two conditionsare equivalent:

1. Ker(F ) = {0V }.

2. F is injective.

Theorem 4.2. Let F : V → W be an injective linear map. If vl, . . . ,vn arelinearly independent elements of V , then L(vl), . . . , L(vn) are also linearly inde-pendent elements of W.

Definition 4.6 (Image of a linear map). Let F : V → W be a linear map. Wedefine the image of F to be the set of all elements in W that are the image throughF of at least one element in V

Im(F ) = {w ∈ W : ∃v ∈ V : F (v) = w} ⊆ W. (4.10)

The image of F is a vector subspace of W . In particular, it is always non-emptysince the zero vector in V is always mapped to the zero vector in W .

Definition 4.7 (Surjective map). A linear F : V → W is surjective if the domainV is mapped through F to the entire codomain W , that is Im(V ) = W.

The next theorem relates the dimensions of the kernel and image of a linearmap with the dimension of the space on which the map is defined.

Theorem 4.3 (Rank-Nullity Theorem). Let V be a vector space. Let F : V → Wbe a linear map of V into another vector space W . Then

dim(V ) = dim(Ker(F )) + dim(Im(F )). (4.11)

35

Definition 4.8 (Bijective map). A linear F : V → W is bijective if it isinjective and surjective.

Theorem 4.4. Let V and W be two vector spaces for which dim(V ) = dim(W ).Let F : V → W be a linear map. If F is injective, or if F is surjective, then F isbijective.

4.3 Linear mappings and matrices

Definition 4.9 (The linear map associated to a matrix). Given an m× n matrixA over K, we can always associate with A a linear map

LA : Kn → Km (4.12)

by definingLA(v) := Av, (4.13)

where the product in (4.13) represents the matrix product by seeing any columnvector v ∈ Kn as an n× 1 matrix. Both A and v are compatible for multiplicationand the result of the product will be an m× 1 matrix, that is a column vector inKm. Linearity is an immediate consequence of properties of matrix multiplicationdescribed in Section 3.3. Indeed, we have that

LA(v1 + v2) = A(v1 + v2) = Av1 + Av2 = LA(v1) + LA(v2) (4.14)

andLA(cv) = A(cv) = cAv = cLA(v) (4.15)

for all vectors v1,v2 ∈ K and for all scalars c ∈ K.We call LA the linear map associated with the matrix A. It can be easily

shown that different matrices give rise to different associated maps. In otherwords, if two matrices give rise to the same linear map, then they are necessarilyequal.

We’ve seen so far that any matrix in Km×n leads immediately to a linear mapof Kn into Km. The opposite direction holds true, too.

Definition 4.10 (The matrix associated to a linear map ). Let L : Kn → Km be alinear map. Then there exists a unique matrix A such that L = LA. The columnvectors of A are the value of the linear map in the vectors of the canonical basis{e1, . . . , en} of Kn. Precisely, for j = 1, . . . , n, the jth column of A is found byapplying L to the jth standard basis vector

A = [L(e1), . . . , L(en)] ∈ Km×n. (4.16)

36

Remark 4.2. The canonical basis for the space Cn is the same as for the spaceRn, so both definitions 4.9 and 4.10 are given for K = R and K = R in the samemanner.

We’ve investigated so far the relation between matrices and linear maps of Kn

into Km. A question arises naturally: do linear maps between arbitrary vectorspaces over K have matrix representation? The answer, provided that the vectorspaces are finite-dimensional, is yes. We will find out that, once the basis for eachspace are settled, the matrix representation of the map is unique.

Definition 4.11. Consider a finite-dimensional vector space V and let n be itsdimension. Given a basis {v1, . . . ,vn} of V , any vector v in V admits the uniquerepresentation

v = a1v1 + · · ·+ anvn (4.17)

for suitable coefficients a1, . . . , an ∈ K. Then we can consider the isomorphism

Φ : Kn → V (4.18)

defined by(a1, . . . , an) 7→ a1v1 + · · ·+ anvn (4.19)

that maps the unique coordinates of any vector to the representation of the vectoras a linear combination of the coordinates. We say that V is isomorphic to Kn

under the map Φ and we writeV ∼= Kn. (4.20)

Assume we are given two finite-dimensional vector spaces V and W over K anda linear map L : V → W . Assume, moreover, that dim(V ) = n and dim(W ) = mand let {v1, . . . ,vn} and {w1, . . . ,wm} be bases of V and W respective1y. Usingthe isomorphism Φ, we can identify V ∼= Kn and W ∼= Km and interpret L as alinear map of Kn into Km, and thus we can associate a matrix with L. This matrixstrongly depends on the choice of the basis. Precisely, it is the unique matrixA having the property that if we denote by [x]{v1,...,vn} the (column) coordinatevector of a vector v ∈ V , relative to the basis {v1, . . . ,vn}, then A[x]{v1,...,vn} isthe (column) coordinate vector of L(x), relative to the basis {w1, . . . ,wm}, thatis

L(x){w1,...,wn} = A[x]{v1,...,vn}. (4.21)

We shall also writeA{v1,...,vn}{w1,...,wm}(L) (4.22)

to denote that A is the matrix associated to the linear map L and to indicate therespective choices of the basis for V and W .

37

In short, the matrix carries all the essential information. If the basis is known,and the matrix is known, then the transformation of every vector is known. Thecoding of the information is simple. To transform a space to itself, one basis isenough. A transformation from one space to another requires a basis for each.

38

Chapter 5

Scalar products, norms andorthogonality

5.1 Scalar products on vector spaces

Definition 5.1 (Scalar product on a vector space). Let V be a vector space overR. A scalar product on V is a map

(·, ·) : V × V → R (5.1)

satisfying the following properties

i) bilinearity : it is linear with respect to both arguments, which means that itis linear in the first argument when the second is kept constant

(cv1 + dv2,w) = c(v1,w) + d(v2,w), ∀v1,v2,w ∈ V and ∀c, d ∈ R

and vice versa

(v, cw1 + dw2) = c(v,w1) + d(v,w2), ∀v,w1,w2 ∈ V and ∀c, d ∈ R;

ii) symmetry : (v1,v2) = (v2,v1), ∀v1,v2 ∈ V ;

iii) positive-definiteness : (v,v) ≥ 0, ∀v ∈ V and (v,v) = 0 if and only ifv = 0V .

Remark 5.1. The scalar product is sometimes denoted by 〈·, ·〉 and it is alterna-tively referred to as the inner product. A scalar product on a vector space Vis also called a symmetric positive-definite bilinear form. The word “form”is used because the map (5.1) is scalar valued. We shall say that V is endowedwith the scalar product (·, ·).

39

Remark 5.2. For K = C, the symmetry property becomes

ii) hermitian: (v1,v2) = (v2,v1), ∀v1,v2 ∈ V ,

where z denotes the complex conjugate of the complex number z ∈ C.

Example 5.1 (Standard Euclidean product in Rn). Let x and y denote vectorsin Rn. The standard Euclidean product in Rn is the map

(·, ·) : Rn × Rn → R (5.2)

defined by

(x,y) := yTx =n∑i=1

xiyi. (5.3)

The standard Euclidean product in Cn is the map

(·, ·) : Cn × Cn → C (5.4)

defined by

(x,y) := yHx =n∑i=1

xiyi. (5.5)

The scalar product in Rn (or Cn) is often denoted by x · y and it is referred toas the dot product because of this notation.

Example 5.2. Let V = C([a, b],R) be the set of all continuous functions definedon the interval [a, b] of the real line, with a < b. Define for any f, g ∈ V

(f, g) :=

∫ b

a

f(x)g(x) dx. (5.6)

It can be easily verified that (5.6) defines a scalar product on V . The definitioncan be extended to V = L2([a, b]) the set of square integrable functions on [a, b].

Example 5.3. Let Rm×n and Cm×n be the vector spaces of m× n matrices withreal and complex entries respectively. Define

(A,B) := tr(ATB) and (A,B) := tr(AHB) (5.7)

for A,B ∈ Rm×n and Cm×n respectively, where the trace of an n× n matrix M istr(M) :=

∑ni=1mii. It can be shown that definitions (5.7) define scalar products

in the space of m × n matrices and they are referred to as the standard scalarproducts for matrices. Observe that these reduce to the standard Euclideanscalar product for vectors in Rm when n = 1.

40

Definition 5.2 (Orthogonal vectors and orthogonal subspaces). Let V be a realvector space endowed with a scalar product (·, ·). Two vectors v1,v2 ∈ V aresaid to be orthogonal, or perpendicular, if (v1,v2) = 0 and this is denoted bywriting v1 ⊥ v2.

Given a subset S ⊂ V , we define the orthogonal subspace of S by

S⊥ := {w ∈ V such that (v,w) = 0 ∀v ∈ S}.

It can be easily proved that S⊥ is a vector subspace of V .

Remark 5.3. Let V = Rn endowed with the Euclidean scalar product. Observethat the zero vector in Rn is orthogonal to every vector in Rn because 0Tx =∑n

i=1 0 · xi = 0 for any x in Rn.

Theorem 5.1. Let A be an n×n matrix over R. The scalar product in Kn satisfiesthe following main property

(Ax,y) = (x, ATy),∀x,y ∈ Rn. (5.8)

Let A be an n× n matrix over C. Then

(Ax,y) = (x, AHy),∀x,y ∈ Cn. (5.9)

Definition 5.3 (Orthogonal matrices). An n × n matrix Q over R is called or-thogonal if

QQT = QTQ = I. (5.10)

Definition 5.4 (Unitary matrices). An n× n matrix Q over C is called unitaryif

QQH = QHQ = I. (5.11)

Properties (5.8) and (5.9) have the important consequence that orthogonal(unitary) matrices preserve scalar products.

Theorem 5.2. Let Q be an orthogonal (unitary) matrix of order n over K (K = Ror K = C). Then

(Qx, Qy) = (x,y), ∀x,y ∈ Kn. (5.12)

Proof. Assume that Q is orthogonal. Then property (5.8), the associative propertyof matrix multiplication and the definition of orthogonality (5.10) imply

(Qx, Qy) = (x, QT (Qy)) = (x, (QTQ)y) = (x,y), ∀x,y ∈ Rn. (5.13)

The proof is the same for the complex case.

Theorem 5.3. An n × n matrix A over R is orthogonal if and only if the rowvectors of A form an orthonormal basis of Rn, if and only if the column vectors ofA form an orthonormal basis of Rn.

41

5.2 Norms on vector spaces

Definition 5.5 (Norm on K). Let V be a vector space over K. A norm on V isa map

‖ · ‖ : V → K (5.14)


i) positive-definiteness : ‖v‖ ≥ 0, ∀v ∈ V and ‖v‖ = 0 if and only if v = 0V ;

ii) ‖av‖ = |a|‖v‖ ∀a ∈ K,∀v ∈ V , where |a| denotes the absolute value if a isa real number and the module if a is a complex number;

iii) triangular inequality : ‖v1 + v2‖ ≤ ‖v1‖+ ‖v2‖ ∀v1,v2 ∈ V .

A vector space V with a norm ‖ · ‖V is a normed vector space and it isdenoted by (V, ‖ · ‖V ). If ‖v‖ = 1, then v is called unit vector. If we dividea nonzero vector by its norm, we obtain a unit vector. This process is callednormalizing the vector. Maps | · | : V → K that satisfy conditions ii) and iii),but are not positive definite are called semi-norms.

Example 5.4 (Holder norms on Kn). Consider (Kn, ‖ · ‖p), where ‖ · ‖p for p ≥ 1denotes the Holder norm or the p-norm defined by

‖x‖p :=

(n∑i=1

|xi|p)1/p

. (5.15)

For instance, for p = 1 we find the grid norm

‖x‖1 :=n∑i=1

|xi| (5.16)

while the case p = 2 defines the Euclidean norm which is considered inExample 5.5.

As p goes to infinity, the limit of ‖ · ‖p exists and it is given by the infinity(max) norm

‖x‖∞ = limp→∞‖x‖p = max

1≤i≤n|xi|. (5.17)

For instance, if x = (−1, 0, 3)T ∈ R3, then ‖x‖1 = 4 and ‖x‖∞ = 3.

42

Example 5.5 (Norms induced by scalar products). Let V be a vector space overK. We shall say that the scalar product (·, ·) over V induces the norm ‖ · ‖ on Vby letting

‖v‖2 := (v,v). (5.18)

It can be proved that (5.18) verifies properties i), ii) and iii) in Definition 5.5.An instance of induced norm is the 2-norm, which defines the standard Eu-

clidean norm

‖x‖22 =

n∑i=1

|xi|2 =n∑i=1

x2i (5.19)

that is induced by the standard Euclidean scalar product, since

(x,x) =n∑i=1

x2i = ‖x‖2

2. (5.20)

Remark 5.4. Since each scalar product induces a norm by (5.18), the reversequestion arises naturally. That is, given a vector norm ‖ · ‖ on a vector space V ,can we define a corresponding scalar product on V such that

√(·, ·) = ‖ · ‖? If

not, under what conditions will a given norm be generated by a scalar product?The answer to the former question is “no”. In fact, given a vector norm ‖ · ‖ on avector space V , there exists a scalar product on V such that

√(·, ·) = ‖ · ‖ if and

only if the parallelogram identity

‖v1 + v2‖+ ‖v1 − v2‖ = 2(‖v1‖+ ‖v2‖) (5.21)

holds for all v1,v1 ∈ V . Take as an instance the standard Euclidean vector normin Kn. We already know that it is generated by the standard scalar product, so theparallelogram identity (5.21) must hold for the 2-norm. This is easily visualizedgeometrically in Rn for n = 2, 3. In fact, the parallelogram identity is so namedbecause it expresses the fact that the sum of the squares of the diagonals in aparallelogram is twice the sum of the squares of the sides. However, it can beproved that the parallelogram identity (5.21) doesn’t hold for p 6= 2. So except forthe euclidean norm, the other p-norms cannot be generated by a scalar product.

Remark 5.5 (Geometric interpretation). The Euclidean norm represents thelength of vectors in Rn. For u and v in Rn, the distance between u and vis the length of the vector u− v, that is

dist(u,v) = ‖u− v‖2. (5.22)

43

For instance, let [a1, a2, a3] and [b1, b2, b3] be the coordinates of respectively, uand v in R3. Then

dist(u,v) = ‖u− v‖2 =√

(u− v,u− v) =√

(a1 − b1)2 + (a2 − b2)2 + (a3 − b3)2.(5.23)

Example 5.6 (Backward Triangle Inequality). It can be easily shown by usingthe triangle inequality the following lower bound for the difference of vector norms

| ‖v1‖ − ‖v2‖ | ≤ ‖v1 − v2‖ ∀v1,v2 ∈ V. (5.24)

Theorem 5.4 (Phytagoras’ theorem). Let V be a real vector space and let ‖ · ‖be the norm induced by the scalar product (·, ·) in V . Then

‖v1 + v2‖2 = ‖v1||2 + ‖v2‖2 (5.25)

if and only if v1 ⊥ v2.

Proof. Let us measure the sum of two generic vectors v1 and v2 in V . We applythe definition of the norm induced the a scalar product and exploit the symmetryof the scalar product

‖v1 + v2‖2 = (v1 + v2,v1 + v2)

= (v1,v1) + (v1,v2) + (v2,v1) + (v2,v2)

= ‖v1‖2 + 2(v1,v2) + ‖v2‖2

(5.26)

We observe that (5.25) is satisfied if and only if v1 ⊥ v2.

Remark 5.6. When V is a complex vector space space, then v1 ⊥ v2 if and onlyif ‖cv1 + dv2‖2 = ‖cv1||2 + ‖dv2‖2 for all scalars c and d.

The following inequality is one of the most important inequalities in mathe-matics. It relates scalar products to the induced norms.

Theorem 5.5 (Cauchy-Schwartz inequality). Let V be a vector space over Kendowed with the scalar product (·, ·) and let ‖ · ‖ be the induced norm. Then

|(v1,v2)| ≤ ‖v1‖‖v2‖ ∀v1,v2 ∈ V (5.27)

and |(v1,v2)| = ‖v1‖‖v2‖ if and only if v1 = cv2, for c = (v1,v2)/‖v1‖.

Definition 5.6 (Orthogonal bases). Let V be a vector space over K endowed withthe scalar product (·, ·) which induces the norm ‖ · ‖. A basis {v1, . . . ,vn} of V iscalled an orthogonal basis if

vi ⊥ vj, ∀i, j = 1, . . . , n and i 6= j. (5.28)

If in addition ‖vi‖ = 1 ∀i = 1, . . . , n, the basis is called an orthonormal basis.

44

Remark 5.7. In general, a set {v1, . . . ,vn} of n vectors in V is called an or-thonormal set if vi ⊥ vj, ∀i, j = 1, . . . , n and i 6= j and ‖vi‖ = 1 ∀i = 1, . . . , n. Itcan be proved that any orthonormal set is linearly independent and hence everyorthonormal set of n vectors in an n-dimensional space V is an orthonormal basisfor V .

Example 5.7. The standard basis {e1, . . . , en} for Rn is an orthonormal basis.

Remark 5.8. It is important to know that if V is a finite dimensional vectorspace, then there always exists an orthonormal basis for V . It can be constructedthrough the well-known Graham-Schmidt procedure. If we represent vectors ofV in coordinates with respect to this basis, for instance u = [a1, a2, . . . , an] andv = [b1, b2, . . . , bn], then

(u,v) = a1b1 + a2b2 + · · ·+ anbn. (5.29)

Exercise 5.1. Determine which of the following vectors in C3 forms an orthonor-mal basis of C3

w =

[i√3,i√3,i√3

]T, x =

[−2i√

6,i√6,i√6

]T, y =

[i√6,i√6,−2i√

6

]T(5.30)

and

z =

[0,−i√

2,i√2

]T. (5.31)

Solution 5.1. Vectors in an orthonormal basis must be unit vectors so first com-pute their norms to see if we can exclude anyone and we find out that

‖w‖22 = ‖x‖2

2 = ‖y‖22 = ‖z‖2

2 = 1. (5.32)

We can’t exclude anyone, so we investigate their mutual orthogonality and we findout that

(w,x) = (w,y) = (w, z) = (x, z) = 0, (x,y) 6= 0 and (y, z) 6= 0. (5.33)

We see that the only candidate for a basis is {w,x, z}. We can easily prove thatthese vectors are linearly independent. Since dim(C3) = 3, we conclude that{w,x, z} form an orthonormal basis of C3.

Definition 5.7 (Equivalent norms). Two norms ||·|| and |||·||| on V are equivalentif there exist two positive constants c and C such that:

c|||v||| ≤ ||v|| ≤ C|||v||| ∀v ∈ V. (5.34)

45

Theorem 5.6. All norms in finite dimensional vector spaces are equivalent.

For instance, all norms in Rn are equivalent. For example, it can be shownthat

n−1/2‖x‖2 ≤ ‖x‖∞ ≤ 1‖x‖2, (5.35)

where the constants are c = n−1/2 and C = 1.

5.3 Matrix norms

Matrix norms can be used to provide a measure of distance on the space of matrices.They play an important role in the analysis of matrix algorithms. For instance,they are useful for assessing the accuracy of computations or for measuring progressduring an iteration.

Definition 5.8. A matrix norm is a map

‖ · ‖ : Cm×n → R (5.36)


i) ‖A‖ ≥ 0, ∀A ∈ Cm×n and ‖A‖ = 0 if and only if A = O (positive-definiteness);

ii) ‖aA‖ = |a|‖A‖ ∀a ∈ C,∀A ∈ Cm×n;

iii) ‖A+B‖ ≤ ‖A‖+ ‖B‖ ∀A,B ∈ Cm×n (triangular inequality);

The most frequently used matrix norms in numerical linear algebra are theFrobenius norm and the matrix p-norms induced by the vector p-norms.

Example 5.8 (Frobenius norm). The Frobenius norm is defined by the equations

‖A‖F :=

√√√√ n∑i,j=1

|aij|2 =√

tr(AAH). (5.37)

Recall that tr(•) denotes the trace of a matrix M and it is defined by tr(A) :=∑ni=1 mii. By trivially applying definition (5.37) one finds out that the Frobenius

norm of the identity matrix of order n concides with its trace

||In||2F = 12 + · · ·+ 12 = n = tr(In). (5.38)

46

Example 5.9 (The matrix p-norm). The vector p-norm on Cn induces (or gener-ates) the matrix p-norm on Cm×n by defining

‖A‖p := supx 6=0

‖Ax‖p‖x‖p

∀A ∈ Cm×n and ∀x ∈ Cn. (5.39)

It is clear that ‖A‖p = max‖x‖p=1 ‖Ax‖p. It is important to notice that (5.37)defines a family of norms, depending on the space of matrices where they aredefined. For instance, the 2-norm on C3×2 is different from the 2-norm on C6×5.

If A ∈ Cm×n, the 1-norm is the maximum of the absolute column sums

‖A‖1 = max1≤j≤n

m∑i=1

|aij| (5.40)

and the ∞-norm is the maximum of the absolute row sums

‖A‖∞ = max1≤i≤m

n∑j=1

|aij|. (5.41)

Example 5.10 (Matrix norms induced by vector norms). More generically, anyvector norm ‖ • ‖ on Cn induces (or generates) a matrix norm ‖ • ‖ on Cm×n bysetting

‖A‖ := supx 6=0

‖Ax‖‖x‖

∀A ∈ Cm×n and ∀x ∈ Cn. (5.42)

The norm of A according to (5.42) measures the largest amount by which anyvector is amplified by matrix multiplication. Trivially, (5.42) implies that

‖Ax‖ ≤ ‖A‖‖x‖ ∀A ∈ Cm×n and ∀x ∈ Cn. (5.43)

In other words, we can equivalently say that ‖A‖ bounds the “amplifying power”of the matrix A, for all vectors x. It is evident that ‖A‖ = max‖x‖=1 ‖Ax‖. Thissuggests that the induced matrix norm ‖A‖ represents the maximum extent towhich a vector on the unit sphere S = {x ∈ Cn | ‖x‖ = 1} can be stretched bythe matrix A.

When A is nonsingular

min‖x‖=1

‖Ax‖ =1

‖A−1‖, (5.44)

suggesting that 1/‖A−1‖ measures the extent to which a nonsingular matrix A canshrink vectors on S.

47

Definition 5.9. A matrix norm ‖•‖ is compatible (or consistent) with a vectornorm ‖ • ‖ if

‖Ax‖ ≤ ‖A‖‖x‖ ∀A ∈ Cm×n and ∀x ∈ Cn. (5.45)

The p-norms satisfy the compatibility

‖Ax‖p ≤ ‖A‖p‖x‖p ∀A ∈ Cm×n and ∀x ∈ Cn. (5.46)

More generically, it is apparent by (5.43) that an induced matrix norm is com-patible with its underlying vector norm. As another important instance, it canbe proved that the Frobenius matrix norm ‖ • ‖F and the Euclidean vector norm‖ • ‖2 are compatible

‖Ax‖2 ≤ ‖A‖F‖x‖2 ∀A ∈ Cm×n and ∀x ∈ Cn. (5.47)

.

Definition 5.10. A matrix norm is called sub-multiplicative if

‖AB‖ ≤ ‖A‖‖B‖ ∀A ∈ Cm×n and ∀B ∈ Cn×s. (5.48)

Not all matrix norms satisfy this property. An instance of a matrix norm which isnot sub-multiplicative is given by

||A||∆ := maxi,j=1,...,n

|aij|, ∀A ∈ Cm×n. (5.49)

Indeed, for

A = B =

[1 11 1

](5.50)

we find that

‖A‖∆ = ‖B‖∆ = 1, (5.51)

but for

AB =

[1 11 1

] [1 11 1

]=

[2 22 2

](5.52)

we have that

‖A‖∆‖B‖∆ = 1 < 2 = ‖AB‖∆. (5.53)

Nevertheless, we mostly work with norms that satisfy (5.48). Many textbooksembed the sub-multiplicative property in the definition of matrix norms.

48

Theorem 5.6 implies that all norms in Cm×n are equivalent. Some equivalencerelationships hold as follows

‖A‖2 ≤ ‖A‖F ≤√

min{m,n}‖A‖2, (5.54)

1√n‖A‖∞ ≤ ‖A‖2 ≤

√m‖A‖∞, (5.55)

1√m‖A‖1 ≤ ‖A‖2 ≤

√n‖A‖1. (5.56)

The efficacy of the matrix 1- and ∞-norms is that they are easy computations(O(n2)), while the calculation of the 2-norm is a more involved computation.Finally, in Chapter 6, we would like to stress out the theoretical importance of the2-norm, along with some additional properties.

5.4 Exercices

Exercise 5.2. Consider the matrix

A =

1 0 −12 3 54 −1 3

. (5.57)

Compute the 1-, the ∞- and the Frobenius norm of A.

Solution 5.2. The 1-norm of A is

‖A‖1 = max1≤j≤3

3∑i=1

|aij|

= max{1 + 2 + 4, 0 + 3 + 1, 1 + 5 + 3}= max{7, 4, 9}= 9.

(5.58)

The ∞-norm of A is

‖A‖∞ = max1≤i≤3

3∑j=1

|aij|

= max{1 + 0 + 1, 2 + 3 + 5, 4 + 1 + 3}= max{2, 10, 8}= 10.

(5.59)

49

The Frobenius norm of A is

‖A‖F : =

√√√√ 3∑i,j=1

|aij|2

=√

1 + 1 + 4 + 9 + 25 + 16 + 1 + 9

=√

66.

(5.60)

50

Chapter 6

Eigenvalues and eigenvectors

6.1 Eigenvalues and eigenvectors

Definition 6.1 (Eigenvalues and eigenvectors of a linear map). Let V be a vectorspace over K (K = R or K = C) and let L : V → V be a linear map. A nonzerovector v ∈ V is called an eigenvector or an eigenfunction of L if there exists ascalar λ ∈ K such that

Lv = λv. (6.1)

The scalar λ is called an eigenvalue of L corresponding to the eigenvector v. Inother words, L acts on eigenvectors as multiplication by scalars (the correspondingeigenvalues).

Remark 6.1. Note that the definition of eigenvector requires that it is a nonzerovector. In fact, if v = 0 were allowed, then any scalar λ would be an eigenvaluesince L0 = λ0 holds for any λ ∈ K. On the other hand, we can have λ = 0K, andv 6= 0.

Definition 6.2 (Eigenvalues and eigenvectors of a matrix). Let A be an n × nmatrix over K. An eigenvector of A is an eigenvector of the associated linear mapLA : Kn → Kn, defined by LAv = Av for all v ∈ Kn. Thus an eigenvector of A isa nonzero vector v ∈ Kn for which there exists λ ∈ K such that Av = λv.

Geometrically, (6.1) means that eigenvectors are those vectors that experienceonly changes in magnitude or sign under the action of L. Specifically, the eigen-value λ is simply the amount of “stretch” or “shrink” to which the eigenvector v issubjected when transformed by L. For instance, Lv = 2v says that v is stretchedby factor 2 under transformation by L. Similarly, Lv = (−1/3)v says that Lreverses v and shrinks it by a factor 1/3.

51

The prefix eigen- is adopted from the German word “eigen”, which means“owned by” or “peculiar”, “specific”, “characteristic” to. It was the mathemati-cian David Hilbert who introduced the terms Eigenwert and Eigenfunktion. Eigen-values and eigenvectors are equivalently called characteristic values and char-acteristic vectors, proper values and proper vectors, or latent values andlatent vectors. The couple (λ,v) is also called eigenpair.

Eigenvectors are never unique. If v is an eigenvector for L with associatedeigenvalue λ, then so is cv, for any c ∈ K with c 6= 0. Indeed,

L(cv) = cLv = c(λv) = λ(cv). (6.2)

More generally, let

Eλ := {v ∈ V |Lv = λv}:= {eigenvectors associated with λ} ∪ {0V }.

(6.3)

Eλ is called the eigenspace of L corresponding to λ. By observing that (6.1) canwe rewritten as (L− λIV )v = 0, we deduce that Eλ = Ker(L− λIV ).

Theorem 6.1. Let V be a finite dimensional vector space and let L : V → V bea linear map. Then λ ∈ K is an eigenvalue of L if and only if Eλ 6= {0V } if andonly if L− λI is not invertible.

For matrices A ∈ Kn×n, the eigenspace associated to an eigenvalue λ is bydefinition the eigenspace of LA corresponding to λ, hence Eλ = Ker(A − λIn).Similarly to Theorem 6.1, we state the following important theorem.

Theorem 6.2. Let A be an n× n matrix over K. Then λ ∈ K is an eigenvalue ofA if and only if Eλ 6= {0Kn} if and only if the matrix A − λIn is not invertible ifand only if det(A− λIn) = 0.

The following theorem is an important result that states that to different eigen-values correspond linearly independent eigenvectors.

Theorem 6.3. Let V be a vector space over K and let L : V → V be a linear map.Let λ1, . . . , λk be eigenvalues of L, and let v1, . . . ,vk be associated eigenvectors.If λi 6= λj for any i 6= j, then the vectors v1, . . . ,vk are linearly independent.

Corollary 6.1. Let V be a finite dimensional vector space of dimension n and letL : V → V be a linear map having n eigenvectors v1, . . . ,vn whose eigenvaluesλ1, . . . , λn are distinct. Then {v1, . . . ,vn} is a basis of V .

52

An important consequence of Theorem 6.3 is that any n × n matrix A overK can have at most n distinct eigenvalues in K. Indeed, if A had m > n dis-tinct eigenvalues, then the associated eigenvectors would be a set of m linearlyindependent vectors of Kn, which is impossible (cf. Theorem 2.1).

Next, we define the important notions of spectrum and spectral radius.

Definition 6.3 (Spectrum of a linear map). Let V be a vector space over K andlet L : V → V be a linear map. The set of distinct eigenvalues of L is called thespectrum of L and it is denoted by σ(L).

By definition, the spectrum of an n× n matrix A is the spectrum of LA and it

is denoted by σ(A). For instance, the eigenvalues of A =

(2 22 2

)are 0 and 4 and

therefore σ(A) = {0, 4}.

Definition 6.4 (Spectral radius of a matrix). Let A be an n× n matrix over K.The spectral radius of A is defined by

ρ(A) = maxλ∈σ(A)

|λ|. (6.4)

We observe that for any induced matrix norm ‖ • ‖ it holds

|λ| ≤ ‖A‖ for all λ ∈ σ(A) (6.5)

Indeed, given an eigenpair (λ,v) where v 6= 0, λv = Av implies |λ|‖v‖ = ‖Av‖ ≤‖A‖‖v‖, from which we deduce (6.5). Equation (6.5) leads to the crude (butcheap) upper bound on ρ(A)

ρ(A) ≤ ‖A‖ for any ‖ • ‖. (6.6)

6.2 Computation of eigenpairs

We address the following questions. If V is a vector space over K, and L : V → Vis a linear map, how can we find out the set of eigenvalues of L? Once the latest areat our disposal, how can we find out the associated eigenvectors (or eigenspaces)?

Definition 6.5. Given anm×nmatrix A over K, the characteristic polynomialassociated with A is defined by

pA(λ) = det(A− λIn). (6.7)

53

Notice that by expanding according to the first column, we find

pA(λ) = det(A− λI)

=

∣∣∣∣∣∣∣a11 − λ . . . a1n

......

an1 . . . ann − λ

∣∣∣∣∣∣∣= (a11 − λ) · · · (ann − λ) + · · ·= (−1)nλn + low order terms in λ,

(6.8)

we deduce that pA(λ) is a polynomial of degree n in the variable λ, whose leadingterm is (−1)nλn.

By Theorem 6.2, the scalar λ is an eigenvalue for any givenm×nmatrixA if andonly if pA(λ) = 0. The equation pA(λ) = 0 is called the characteristic equationof the matrix A. Determinants give us a direct way of computing eigenvalues ofa matrix, since the latest are are the solutions of the characteristic equation or,equivalently, the roots of the characteristic polynomial. Clearly, this is possiblewhenever we can determine explicitly the roots of pA(λ). Sometimes this is aneasy task but in the majority of cases it is a formidable task.

Example 6.1. Find the eigenvalues and the associated eigenspaces for

A =

(1 91 1

). (6.9)

We have that

A− λI =

(1− λ 9

1 1− λ

)and pλ(A) = (1− λ)2 − 9 = λ2 − 2λ− 8. (6.10)

This factors as pA(λ) = (λ− 4)(λ + 2), so there are two eigenvalues: λ1 = 4, andλ2 = −2.

Once the eigenvalues are known, it is straightforward to compute the corre-sponding eigenvectors and eigenspaces. Indeed, in order to find the eigenvectorscorresponding to λi for i = 1, 2 we apply Definition (6.1) and we seek for a non-trivial solution to the corresponding homogeneous equation (A− λiI)v = 0.

For λ1 = 4, let v1 = (v1,1, v1,2)T denote a corresponding eigenvector, whichsatisfies (

1− λ 91 1− λ

)(v1,1

v1,2

)=

(00

). (6.11)

This leads to a linear system of equations{−3v1,1 + 3v1,2 = 0

3v1,1 − 3v1,2 = 0(6.12)

54

Notice that the equations are multiples of each other, so it is sufficient to solve oneof them. The general solution of each of them consists of all vectors v such that

v =

(v1,1

v1,2

)= c

(11

)where c is arbitrary. (6.13)

Eigenvectors are obtained for all c 6= 0. For instance, by choosing c = −3, we get

an eigenvector v1 =

(−3−3

). The set of all eigenvectors is a line with the origin

missing and the eigenspace for λ1 = 4 is

E4 :=

{c

(11

)where c is arbitrary

}∪ {0}. (6.14)

It has dimension one and a possible basis is given by

(11

). Note that any non-zero

scalar multiples of this vector is also a basis.We can proceed similarly for λ2 = −2.

The following example shows that the eigenvalues strongly depend on the un-derlying field K.

Example 6.2. Consider the 2× 2 matrix A with real entries (over R)

A =

(0 1−1 0

). (6.15)

We have that

A− λI =

(−λ 1−1 −λ

)and pλ(A) = λ2 + 1. (6.16)

The characteristic equation λ2 + 1 = 0 has no solutions over the field of realnumbers, so there are no eigenvalues or eigenvectors. However, if we considerA defined over C, then there are two eigenvalues i and −i, with corresponding

eigenvectors the scalar multiples of the vectors

(−1i

)and

(1i

)respectively.

Let’s complete the discussion by recalling a powerful result given by funda-mental theorem of algebra, which states that every polynomial of degree n in thedomain of complex numbers has exactly n roots. Some of these roots may becomplex numbers (even if all the coefficients are real), and some roots may be re-peated. Therefore, altogether any n×n matrix A over C has exactly n eigenvalues.Some eigenvalues might lie in C, while others might be repeated. In particular, ifthe entries of A lie in R ⊂ C, complex eigenvalues occur in conjugate pairs, i.e. ifλ ∈ σ(A), then λ ∈ σ(A). This is a consequence of the fact that the roots of apolynomial with real coefficients occur in conjugate pairs.

55

Definition 6.6 (Multiplicities). Let λ ∈ σ(A) be an eigenvalue of A. The numberof times λ is repeated as a root of pλ(A) is called the algebraic multiplicity ofλ and it is denoted by ma(λ). Eigenvalues that occur only once as roots of pλ(A)have ma(λ) = 1. These are called simple eigenvalues of A. The geometricmultiplicity of λ is dim(Eλ) = dim(Ker(A − λI)) and it is denoted by mg(λ).In other words, the geometric multiplicity of λ is the maximal number of linearlyindependent eigenvectors corresponding to λ. Eigenvectors for which mg(λ) =ma(λ) are called semisimple eigenvalues of A.

Theorem 6.4. Let A be an n× n matrix over C and let λ ∈ σ(A). Then,

1 ≤ mg(λ) ≤ ma(λ). (6.17)

Let’s see now some examples of computations of particularly favorable matrices.

Example 6.3. The eigenvalues of a diagonal matrix are its diagonal entries. Forexample, the matrix

A =

(−2 00 3

)(6.18)

has eigenvalues λ1 = −2 with corresponding eigenspace

E−2 :=

{c

(10


}∪ {0} (6.19)

and λ2 = 3 with corresponding eigenspace

E3 :=

{c

(01


}∪ {0}. (6.20)

Example 6.4. The only possible values for the eigenvalues of a projection matrixare 0 and 1. For example, the matrix

A =

(12

12

12

12

)(6.21)

has eigenvalues λ1 = 1 with corresponding eigenspace

E1 :=

{c

(11


}∪ {0} (6.22)

and λ2 = 0 with corresponding eigenspace

E0 :=

{c

(1−1


}∪ {0}. (6.23)

56

Everytime λ = 1, an associated eigenvector is projected into itself, and everytimeλ = 0, an associated eigenvector is projected to the zero vector. Like every otherscalar, zero might or might be not an eigenvalue and there is nothing exceptionalabout that. There is an information that we can deduce if zero is an eigenvalueand it is that A is singular (not invertible), i.e. its determinant is zero. Invertiblematrices must have all eigenvalues different from zero.

Example 6.5. Every triangular matrix has the eigenvalues sitting along the maindiagonal. Indeed, consider the 3× 3 matrix

A =

1 9 30 2 −10 0 −4

. (6.24)

Since the determinant of any triangular matrix is the product of the diagonalentries, we have that

pA(λ) = det(A− λI) =

∣∣∣∣∣∣1− λ 9 3

0 2− λ −10 0 −4− λ

∣∣∣∣∣∣ = (1− λ)(2− λ)(−4− λ) (6.25)

is zero if and only if λ = 1, 2,−4, which give the diagonal entries of A.

6.3 Diagonalization of a matrix

Definition 6.7 (Similar matrices). Two n×n matrices A and B over K are calledsimilar if there exists an n×n invertible matrix P such that B = P−1AP . Goingfrom one to the other is called a similarity transformation.

Similar matrices have the same characteristic equation, hence the same eigen-values. Indeed, by exploiting the properties of the determinant one can easilydeduce that det(B) = det(P−1AP ).

In particular, we recall the different matrices corresponding to a linear map areall similar,(spiega fai riferimento)therefore they all have the same characteristicequation. In particular, this implies that the eigenvalues of a linear map areindependent on the particular choice of the basis.

Definition 6.8. An n × n matrix over K is diagonalizable if it is similar to adiagonal matrix.

There is a strong connection between diagonalizable matrices and eigenvectors.More specifically, suppose that A has n linearly independent eigenvectors. Let Sbe the square matrix whose columns are given by the eigenvectors of A. Then, it

57

turns out that the matrix S−1AS is a diagonal matrix Λ whose diagonal entriesare the eigenvalues of A:

S−1AS = Λ =

λ1 0 . . . 00 λ2 . . . 0...

......

...0 0 . . . λn

(6.26)

We will refer to S as the eigenvector matrix and Λ the eigenvalue matrix. Theeigenvector matrix S converts the matrix A into a diagonal matrix (its eigenvaluematrix Λ).

Remark 6.2. If the matrix A has n distinct eigenvalues, then the corresponding neigenvectors are automatically linearly independent (cf. Thm 6.3). Consequently,any n × n matrix with n distinct eigenvalues can by diagonalized. The converseis not true. Matrices that have repeated eigenvalues may be a case of not diag-onalizable matrices, but in general it is not said–it only depends on whether thecorresponding n eigenvectors are linearly independent or not. An obvious coun-terexample is given by the identity matrix of order n, which has only a singleeigenvalue, the scalar 1K, repeated n times, but it is diagonal already.

Remark 6.3. The only possible matrices S that diagonalize a given matrix A arethose whose columns are eigenvectors of A. Other choices of S will not produce adiagonal matrix from S−1AS.

Remark 6.4. The eigenvector matrix S in representation (6.26) is not unique.Indeed, each eigenvector can be multiplied by a nonzero scalar and it remains aneigenvector. If we pre-multiply the columns of S by any nonzero constant, we willget a new valid S for which S−1AS = Λ.

Remark 6.5. Not all matrices are diagonalizable since not all matrices posses ndistinct eigenvectors and we cannot construct S in those cases. These are calleddefective matrices.

Symmetric matrices defined over R enjoy some nicer properties. Their eigenval-ues are all real, eigenvectors corresponding to distinct eigenvalues are orthogonal,and those matrices can be diagonalized by real orthogonal matrices.

Theorem 6.5. Let A be a real symmetric matrix. Then A has an eigenvalue inR, and all complex eigenvalues of A lie in R.

Theorem 6.6. Let A be a real symmetric n × n matrix, where Rn is endowedwith the Euclidean scalar product. Let λ1, λ2 be two distinct eigenvalues of A,with corresponding eigenvectors v1,v2, then v1 ⊥ v2.

Theorem 6.7. Let A be a real symmetric n× n matrix. Then there exists a realorthogonal matrix Q with Q−1AQ = QTAQ = Λ.

58

Bibliography

[1] Serge Lang. Introduction to linear algebra. Springer Science & Business Media,2012.

[2] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHUpress, 2012.

[3] David C Lay. Linear algebra and its applications. Addison-Wesley,, 2012.

[4] Carl D Meyer. Matrix analysis and applied linear algebra, volume 71. Siam,2000.

[5] Gilbert Strang. Linear algebra and its applications. 2006. Thomson,Brooks/Cole.

[6] Alfio Quarteroni, Riccardo Sacco, and Fausto Saleri. Numerical mathematics,volume 37. Springer Science & Business Media, 2010.

59

lecture notes on linear algebra - ornela mulita

Documents